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Technical  Findings  and  Major  Accomplishments 

A.  Software 

I.  ARPA  Network 

In  this  contract  period,  a  Network  C  mtrol  Program  was  designed, 
implemented  and  modified  as  required  to  fully  support  the  Host-Host  protocol 
currently  specified  by  the  Network  Working  Group.  Access  to  the  Network  has 
been  extended  to  include  Fortran  and  PL/l  programs  and  On-Line  System  users, 
as  well  as  assembly- language  programs.  A  User  Telnet  has  been  written  in 
accordance  with  the  protocol  adopted  to  handle  such  teletype- like  communica¬ 
tion. 

A  number  of  services  have  been  made  available  to  Network  users.  Two 
separate  specifications  for  access  to  the  On-Line  System  have  been  designed 
and  implemented,  each  supporting  the  graphic  display  features  of  the  system. 

A  remote  job  entry  facility  has  been  written,  providing  Network  users  with 
access  to  UCSB's  batch  processing  facilities.  A  Network  file  system  has 
been  written,  making  on-line,  direct-access  storage  available  to  the  Network 
conmunity.  Finally,  a  teletype-oriented  interface  to  the  On-Line  System 
has  been  Implemented  according  to  the  Telnet  protocol  adopted  by  the  Network 
Working  Group. 

All  but  the  most  recently  coded  of  these  software  projects  have  been 
documented,  and  that  documentation  distributed  to  mesibers  of  the  Network 
conmunity. 

(a)  Network  Control  Program 

During  this  contract  period,  a  Network  Control  Program  (NCP)  was  designed 
and  Implemented.  Supporting  initially  the  Network  Working  Group’s  (NWG's) 
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Host-Host  protocol  specification  as  embodied  in  "Host-Host  Protocol  Document 
No.  1",  it  has  been  appropriately  modified  as  protocol  changes  have  been 
adopted  by  the  NWG,  and  at  this  writing  operates  in  accordance  with  the  most 
recent  prttocol  speeificiation,  adopted  this  July. 

lESB's  NCP  implementation  has  been  thoroughly  documented  in  a  paper 
entitled  "An  NCP  for  the  AT  PA  Network"  (21  December  70,  NIC  5480)  and 
distributed  to  Network  participants  through  the  Network  Information  Center 
(NIC)  at  Stanford  Research  Institute  in  Menlo  Park.  Seme  of  the  material 
contained  in  that  paper,  and  a  description  of  mor*  recent  additions  to  the 
NCP,  ore  briefly  described  in  the  sections  which  follow. 

(1)  Software  Structure 

The  NCP  has  been  implemented  as  a  normal,  batch-mode  Job,  and  is  entirely 
self-contained  and  independent  of  all  other  software  systems  in  the  host, 
including  the  Operating  jjystem  (OS).  While  in  execution,  the  NCP  causes 
itself  to  appear  as  if  it  were  a  part  of  OS.  Upon  initiation,  the  NCP 
instates  itself  as  the  System's  Supervisor  Call  First  Level  Interrupt 
Handler  (SVC  FLIH).  Applications  programs  issue  a  particular  SVC  to 
cocmunicate  with  the  NCP;  as  the  Bystem's  SVC  FLIH,  the  NCP  is  in  a  position 
to  intercept  that  SVC.  In  a  similar  fashion,  the  NCP  instates  itself  as  the 
System's  Input/Output  (I/O)  FLIH,  and  in  this  capacity  intercepts  I/O 
interrupts  from  the  IMP  Interface;  hence,  the  NCP  can  initiate  I/O  operations 
to  the  Interface  using  the  Start  I/O  (Sin)  instruction.  In  its  role  as  SVC 
FLIH,  the  NCP  intercepts  occurrences  of  SVC  34,  the  supervisor  call  employed 
internally  by  OS  to  schedule  operator  commands  for  execution.  Entries  made 
by  the  operator  at  his  console  which  are  prefixed  with  the  character  are 
trapped  by  the  NCP  and  interpreted  as  NCP  operator  commands.  Thus  the 
Bystem's  operator  is  afforded  a  convenient  means  of  affecting  the  NCP's 
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operation  in  those  ways  necessary  to  make  operation  of  the  NCP  feasible  in 
a  System  which  supports  both  time-sharing  and  batch  operations,  in  addition 
to  Network  activity.  Ety  constructing  its  own  Data  Extent  Block  (DEB),  the 
NCP  can  use  dS's  Execute  Channel  Program  (EXCP)  macro  instruction  to  output 
messages  to  the  operator's  console,  a  device  to  which  the  NCP  would  not 
otherwise  have  direct  access.  Using  EXCP,  the  NCP  can  prefix  its  responses 
to  operator  camnands  with  the  character  making  them  easily  recognizable 
to  the  operator.  Finally,  by  overlaying  a  subroutine  of  OS  which  is  entered 
every  time  a  task  leaves  the  System,  the  NCP  is  able  to  gain  control  as  each 
applications  program  which  used  the  services  of  the  NCP  terminates.  Hence, 

NCP  resources  which  the  user  neglected  to  release,  or  could  not  release 
because  of  an  unexpected  abnormal  termination  (abend),  can  be  reclaimed. 

All  modifications  made  to  the  System  by  the  NCP  are  made  dynamically, 
and  are  removed  by  the  NCP  before  it  terminates  execution.  Thus,  the  NCP's 
passage  through  the  System  appears  exactly  like  that  of  any  other  job, 
independent  of  whether  it  terminates  normally  or  abends. 

(2)  Operator  Control 

The  NCP  provides  the  System's  operator  with  a  comprehensive  ccmnand  set 
through  which  the  NCP's  operation  can  be  monitored,  and  influenced  when 
necessary.  Such  control  capability  was  deemed  necessary  to  make  the  NCP's 
regular  insertion  into  and  removal  from  the  System  convenient  for  both 
operator  and  user.  A  portion  of  the  ccmnand  set  was  designed  primarily  for 
use  by  system's  programmers  during  checkout  of  other,  Network- related  software. 
The  operator  command  set  is  briefly  described  in  the  following  paragraphs. 

A  more  thorough  description  can  be  found  in  a  paper  entitled  "NCP  Operatoz's 
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Guide"  (4  June  71,  NIC  6797)  distributed  to  Network  participants  through 
the  NIC. 

MENU  is  a  command  provided  to  aid  the  operator  in  recalling  the  key¬ 
words  for  other  commands;  it  does  nothing  more  than  list  the  comnand  set  on 
the  operator's  console. 

The  STAT  command  returraa  one-line  summary  of  the  state  of  the  NCP, 
including  the  number  of  local  sockets  in  use  and  the  condition  of  the  IMP 
Interface.  The  operator  is  also  reminded  of  previously  issued  commands  which 
remain  in  effect  and  restrict  the  NCP's  operation. 

LIST  augments  the  STAT  command  by  providing  detailed  infoxmation  about 
all  or  a  subset  of  local  sockets.  For  each  socket  about  which  the  NCP  is 
queried,  the  following  information  Is  displayed:  the  name  of  the  job  using 
the  socket,  the  state  of  the  socket,  and  the  hoBt  and  socket  to  which  it  is 
connected. 

DRAIN  provides  the  operator  with  a  means  for  gradually  decreasing  the 
level  of  NCP  utilization  in  anticipation,  for  example,  of  an  Initial  Program 
Load  (IPL).  While  a  DRAIN  command  is  in  effect,  the  NCP  permits  no  new 
Network  connections  to  be  initiated,  and  when  the  last  existing  connection 
has  been  terminated,  so  informs  the  operator. 

STOP  causes  all  existing  connections  to  terminate  by  forcibly  closing 
them.  Those  local  processes  whose  connections  were  so  broken  are  made  aware 
of  the  reason  for  the  intrusion;  to  the  remote  processes  involved,  it  appears 
as  though  the  local  processes  had  broken  the  connections  themselves.  During 
the  termination  sequence,  the  NCP  prohibits  new  connections  from  being 
initiated.  When  all  sockets  have  been  released,  the  operator  is  so  informed. 
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The  START  command  is  provided  to  initiate  Network  operations  after  the 
NCP  is  loaded.  Issuance  of  the  command  causes  the  IMP  Interface  to  be 
initialized,  a  few  NOPs  to  be  transmitted  to  the  Interface  Message  Processor 
(IMP),  and  a  Reset  control  command  to  be  sent  to  each  known  host  in  the 
Network.  Thereafter,  START  serves  to  nullify  a  previously  issued  DRAIN  or 
STOP  command. 

CANCEL  is  provided  as  a  convenient  means  for  terminating  execution  of 
the  NCP.  CANCEL  causes  a  STOP  operator  command  to  be  simulated.  After  a 
brief  pause,  the  NCP  transmits  a  Host- Going- Down  control  message  to  its  IMP. 
Then,  after  a  second  short  pause,  the  NCP  disables  the  IMP  Interface  and 
removes  itself  from  the  System. 

The  LOG  command  enables  the  operator  to  record  a  comment  in  a  log 
maintained  by  the  NCP  on  secondary  storage.  The  text  following  the  keyword 
"LOG"  is  simply  recorded. 

TEST  is  the  first  of  a  series  of  commands  designed  to  aid  both  software 
and  hardware  debugging.  TEST  causes  all  subsequent  I/O  interrupts  generated 
by  the  IMP  Interface  (until  the  next  START  command)  to  be  logged  on  the 
operator's  console  along  with  their  source  (i.e.,  input  or  output  segment 
of  the  Interface)  and  status  information  obtained  from  the  Channel  Status 
Word  (CSW). 

To  permit  software  development  to  proceed  during  periods  when  the  IMP 
or  IMP  Interface  is  inoperative,  the  NCP  is  equipped  with  a  mode  of  operation 
in  which  the  functions  of  the  hardware  are  simulated.  Two  operator  commands  - 
INT  and  EXT  -  are  provided  to  permit  dynamic  switching  between  the  NCP's 
normally  operating  (called  external)  mode  and  this  special  (called  internal) 
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mode.  A  third  ccsrand  -  HOLD  -  aids  in  achieving  a  smooth  transition  between 
modus  by  suspending  output  to  the  IMP  Interface  until  a  START  command  is 
issued. 

RESET  causes  a  Reset  control  command  to  be  transmitted  to  one  or  more 
designated  hosts.  If  and  when  the  host  acknowledges  by  returning  a  Reset 
Reply  control  command,  the  operator  is  so  informed. 

An  operator  command  called  FIND  is  provided  to  aid  in  the  debugging  of 
applications  programs  which  use  the  services  of  the  NCP.  It  can  be  employed 
to  obtain  the  main  storage  address  of  any  program  within  the  System,  given 
its  name  and  the  name  of  the  Job  with  which  it  is  associated;  to  terminate  a 
program  by  abending  it;  and  to  take  a  dump  of  the  program. 

Finally,  the  RELOAD  command  causes  the  NCP  to  be  restarted.  A  new  copy 
of  the  load  module  is  obtained  from  secondary  storage,  and  the  NCP  is 
reentered  at  its  entry  point.  RELOAD  provides  a  means  for  completely  refresh¬ 
ing  the  software  after  a  failure  in  the  NCP,  and  a  means  for  switching 
between  two  versions  of  the  NCP,  a  developmental  version  and  a  production 
one. 

(b)  Network  Access  From  Local  Programs 

The  NCP  provides  any  assembly- language  program  within  the  360/75  with 
full  access  to  the  Network.  Such  access  is  obtained  by  issuing  an  SVC,  which 
is  intercepted  and  processed  by  the  NCP.  During  this  contract  period,  such 
access  has  been  extended  to  programs  written  in  Fortran  and  PL/l,  and  to 
On-Line  Eastern  (OLS)  users.  To  support  Fortran  and  PL/l  applications,  two 
sets  of  subroutines  have  been  written  and  placed  in  the  System's  Fortran  and 
PL/l  subprogram  libraries,  respectively,  and  hence  can  bo  invoked  from  a 
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program  coded  in  either  of  those  languages.  To  support  OLS  users,  a  third 
subsystem  (i.e. ,  in  addition  to  those  supporting  the  Card  Oriented  Language 
(COL)  and  the  Mathematically  Oriented  Language  -  Single  Precision  Floating 
Point)  of  OLS  called  NET  has  been  created  and  outfitted  with  a  set  of 
operators  which  provide  access  to  the  Network.  Both  Fortran  and  PL/l 
subroutines  and  NET  operators  perform  NCP  supervisor  calls  for  the  user. 

The  conventions  for  communicating  with  the  NCP  from  assembly  language, 
Fortran,  and  PL/l  programs,  and  from  OLS  terminals,  are  briefly  described  in 
the  following  sections. 

(1)  Assembly  Language 

A  set  of  macro  instructions  has  been  provided  to  aid  the  assembly- language 
programmer  in  invoking  the  NCP.  Each  macro  instruction  builds  a  parameter 
list,  loads  its  address  into  a  general  purpose  regie  tc r,  and  issues  the  NCP's 
SVC.  A  macro  instruction  is  provided  for  each  service  that  can  be  invoked. 

In  general,  calling  the  macro  serves  only  to  initiate  a  Network  operation; 
the  programmer  assumes  responsibility  for  assuring  the  successful  completion 
of  each  operation  by  (in  most  cases)  directing  an  OS  WAIT  macro  instruction 
to  an  Event  Control  Block  (ECB)  associated  with  the  operation,  which  is  posted 
by  the  NCP  when  the  requested  function  has  been  performed. 

The  paragraphs  which  follow  describe  briefly  the  macro  instructions 
provided.  Detailed  descriptions  of  all  but  the  most  recently  written  can  be 
found  in  "An  NCP  for  the  ARPA  Network".  The  term 'identifier"  denotes  a  five- 
tyte  field  containing  in  its  high-order  byte  a  host  address  and  in  its  low- 
order  32-bits  the  number  of  a  socket  at  that  host. 
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The  three  connection- establishing  operations  -  Connect,  Listen,  and 
Accept  -  are  invoked  by  variations  of  a  macro  instruction  called  "@0PEN", 
called  as  follows: 

@OPEN  ECP-address  [ , LSCK=identif ier^ ,  WS=address  £ , SP=subpoolJ 
[,BS=byte-size]  [,FSCK=identifier2]] 

where  operands  enclosed  in  brackets  are  optional.  If  both  LSCK  (local  socket 
identifier)  and  FSCK  (foreign  socket  identifier)  are  specified,  the  operation 
requested  is  Connect;  if  only  LSCK  is  specified,  the  operation  is  Listen; 
and  if  neither  is  specified,  it  is  Accept.  If  LSCK  specifies  a  send  socket, 
then  BS  may  be  specified  and  determines  the  byte  size  for  the  connection 
(all  valid  byte  sizes  are  supported).  If  the  operation  is  Listen,  the  NCP 
returns  the  calling,  foreign  socket's  identifier  in  the  workspace  WS.  The 
optional  operand  SP  serves  to  place  the  socket  in  a  user-defined  group  of 
sockets;  the  NCP  permits  certain  operations  to  be  applied  to  an  entire  subpool 
of  local  sockets. 

The  @CLOSE  macro  Instruction  whose  calling  sequence  is  simply: 

©CLOSE  EC&=  address 

serves  to  close  on  existing  connection,  retract  a  connection  request,  or 
reject  a  request  for  connection. 

To  transmit  data  through  a  send  socket,  the  foiled. ng  macro  is  invoked: 

©WRITE  EC&saddress,  ADDR=buffer- address 

|^,  OFFSET=buf f er-  off s  et  J ,  BITS=buffer- length 
ADDR  and  OFFSET  specify  an  address,  and  a  bit  offset  from  that  address,  at 
which  the  bit  string  to  be  transmitted  begins  (OFFSET  defaults  to  zero). 


BITS  defines  the  length  of  the  bit  string,  and  it  may  have  any  value  - 
positive,  negative,  or  zero.  A  negative  or  zero  length  implies  no  operation, 
and  any  length  string  (up  to  the  capacity  of  the  machine)  may  be  sent  in  a 
single  <§WRITE  operation.  ©WHITE'  s  are  issued  by  the  user  without  regard  to 
the  byte  size  specified  for  the  connection,  or  for  the  maximum  length  message 
transmittable  through  the  subnet.  Breaking  bit  strings  into  Network  messages 
and  properly  formatting  them  are  the  sole  responsibility  of  the  NCP. 

The  counterpart  of  ©WRITE  for  receive  sockets  is: 

©READ  ECB= address,  ADDR=buf f er-addres s 

OFFSET=buffer-offsetJ,  BITS=buffer- length 
where  the  buffer  designated  is  a  receiving,  rather  than  a  source  buffer. 
Again,  no  restriction  is  imposed  upon  the  size  of  the  buffer,  and  sole 
responsibility  for  extracting  or  concatenating  the  bit  string  from  Network 
messages,  and  for  transmitting  appropriate  allocations  to  the  sender,  rests 
with  the  NCP. 

To  transmit  an  interrupt  over  a  connection,  the  calling  sequence  is: 

©SIGNAL  EC B* address 

A  macro  instruction  is  provided  for  obtaining  information  about  a  local 
sccket,  and  called  as  follows: 

©CHECK  LSCK=identifier,  WS*address 

The  NCP  places  in  the  user's  workspace  the  following  information:  the  state 
of  the  socket,  the  identifier  of  the  socket  connected  to  it,  and  information 
about  the  amount  of  data  queued  by  -che  NCP  or  awaited  by  the  user. 

Another  variation  of  ©CHECK: 


©CHECK  NCP 
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may  be  called  by  a  user  to  determine  whether  or  not  the  NCP  is  in  tbo  System. 
The  information  is  returned  in  the  condition  code. 

The  (SID  macro  instruction  returns  the  identifier  of  the  local  socket 
currently  associated  with  a  specified  ECB: 

(SID  ECB=address,  WS=  ad  dress 

(SIDH  returns  information  about  a  specified  host.  The  calling  sequence  is: 
@IDH  WS= address 

The  workspace  contains  fields  for  both  a  host  address  and  a  character* string 
mnemonic  for  that  host;  the  user  specifies  one  and  the  NCP  the  other.  A 
field  containing  the  status  of  the  host  is  included  in  the  worksi*ace,  and  is 
always  filled  by  the  NCP. 

A  macro  instruction  called  @XCTL  is  provided  to  allow  the  user  to  change 
several  parameters  associated  with  a  socket: 


©CCTL  ECB=address 


[. 


NEW  ECB*  address 


] 


£,  NEWWS*address]  £,  NEWSP* subpool] 


Finally, 

©DISCARD  [sP=  subpool] 

closes  all  local  sockets  in  the  designated  subpool.  If  SP  is  unspecified, 
all  sockets  opened  by  the  invoking  task  are  closed. 

(2)  Fortran  and  PL/l 

Sets  of  subroutines  have  been  provided  to  enable  Fortran  and  PL/l 
programmers  to  communicate  with  the  NCP.  In  general,  a  separate  subroutine 
is  provided  for  each  function  described  in  the  preceding  section;  the  sub¬ 
routine  contains  the  assembly- language  instructions  required  to  interpret 
the  Fortran  /  PL/',  parameter  list  and  issue  the  appropriate  Network  macro 
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instruction.  The  additional  function  of  timing  the  operation  and  aborting  it 
if  not  completed  in  a  specified  interval  is  also  performed  by  the  subroutines. 

The  calling  sequences  for  the  subroutines  currently  provided  are  briefly 
described  in  the  following  paragraphs.  Complete  documentation  has  been 
generated  and  distributed  to  Network  participants  in  the  form  of  two  NWG 
Requests  for  Comments  (RFC's)  entitled,  respectively,  "Network  Fortran 
Subprograms"  and  "Network  PL1  Subprograms"  (21  April  71,  NIC  5^31  and  5832). 
The  names  of  PL/l  subroutines  are  prefixed  with  the  character  The 

arguments  in  the  calling  sequences  directly  parallel  those  in  the  correspond¬ 
ing  assembly- language  macro  instructions,  with  the  exceptions  that  they  are 
Fortran  /  PL/l  variables,  constants,  or  expressions,  rather  than  main 
storage  addresses,  and  that  a  time  interval  is  included  in  several  cases. 
Completion-code's  are  disguised  ECB's. 

Connect,  Listen,  and  Accept  are  invoked  as  follows: 

CALL  ]  OPEN  (conrpletion-code,  time  local- socket- identifier 

£,  foreign-socket-identifier]  ,  workspace]) 

A  socket  is  closed  with  the  following  call: 


CALL  J^@JCLOSE  (completion-code,  time) 

Data  is  transmitted  through  a  send  socket  with  a  subroutine  call 
of  the  form: 

CALL  jj§]wRITE  (ccmpletion-code,  buffer,  length,  time  £,  offset]) 
Data  is  received  through  a  receive  socket  with  the  call: 

CALL  £@]read  (completion-code,  buffer,  length,  time  £,  offset]) 
The  status  of  a  local  socket  is  obtained  by  invoking: 

CALL  £@]cHECK  (local- socket-identifier,  socket- status,  status- 
mnemonic,  for *ign- socket-identifier ,  bit-count) 
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At  the  assembly- language  level,  the  information  returned  by  the  NCP  is  placed 
in  a  single  workspace;  here,  however,  each  piece  of  information  is  placed  in 
a  separate  Fortran  /  PL/l  variable. 

Identifying  the  local  socket  associated  with  (at  this  level)  a  specified 
completion  code  variable  is  accomplished  by  invoking  the  following: 

CALL  3D  (completion-code-variable,  local- socket-identifier) 

An  interrupt  is  transmitted  as  follows: 

CALL  [@lsiGNAL  ^completion-code,  time) 

A  set  of  four  subroutines  in  each  language  is  provided  to  aid  in  applica¬ 
tions  involving  communication  between  a  Fortran  /  PL/l  program  and  an  OLS 
terminal.  Their  use  is  not  inherently  restricted  to  that  application,  but 
their  design  is  oriented  toward  it.  Each  operates  in  conjunction  with  a 
corresponding  OLS  operator  (all  of  which  are  described  in  the  section  follow¬ 
ing  this),  with  the  Fortran  /  PL/l  program  and  OLS  terminal  functioning  as 
co- routines. 

A  subroutine  called  Write  to  On-Line  Console  (WTOLC)  transmits  data, 
assumed  to  be  a  character  string,  for  display  on  what  is  assumed  to  be  an  OLS 
display  device.  The  calling  sequence  is: 

CALL  [@JwTOLC  (completion-code,  buffer,  length,  case,  x-coordinate, 
y-coordinate,  time) 

The  subroutine  prefixes  the  user's  character  string  with  the  OLS  carriage 
control  characters  required  to  place  the  text  at  the  indicated  position  within 
the  display  area.  The  user  may  specify  either  absolute  or  relative  (to  the 
last  display)  coordinates.  "Case"  permits  the  display  of  Greek  (case  2)  or 
user-defined,  special  characters. 
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A  subroutine  called  Read  from  On-Line  Console  (RFOLC)  is  the  counterpart 
of  WTOLC  and  is  applied  to  a  receive  socket.  It  accepts  what  is  assumed  to 
be  a  character  string  entered  by  an  OLS  user  at  his  keyboard.  The  calling 
sequence  is: 

CALL  [VJrFOLC  (completion-code,  buffer,  length,  time) 

A  third  subroutine  called  Write  to  On-Line  Console  with  Reply  (WTOLCR) 
combines  the  functions  of  the  previous  two  by  causing  both  a  message  to  be 
displayed  at  the  user's  terminal,  and  a  response  to  be  accepted  from  him. 

This  subroutine  is  applied  to  a  pair  of  sockets  of  opposite  gender.  The 
calling  sequence  is: 

CALL  WTOLCR  (send- completion- code,  s end-buffer,  length,  case, 
x- coordinate,  y-coordlnate,  time,  receive-completion-code, 
receive-buffer,  length) 

A  final  subroutine  causes  an  OLS  display  device  to  be  erased.  Its 
calling  sequence  is: 

CALL  ERASE  (completion-code,  delay-before-eraae,  delay- 
following- erase,  time) 

A  number  of  the  Fortran  routines  described  above  have  been  used  by 
programmers  at  the  MITRE  Corporation,  McLean,  Virginia,  in  producing  a 
Network  file  system  which  is  to  execute  in  UCSB's  360/75* 

(3)  On-Line  System 

A  subsystem  of  OLS  called  NET  has  been  created  to  house,  among  other 
things,  a  set  of  operators  providing  access  to  basic,  NCP  functions  from  an 
OLS  terminal.  Uiese  operators  were  heavily  used  in  early,  interactive  Network 
experimentation  with  (in  particular)  the  Rand  Corporation  in  Santa  Monica. 

All  of  these  operators  have  been  described  in  detail  in  an  RFC  entitled 


"Network  On-Line  Operators"  (21  April  71,  NIC  5833)  distributed  to  Network 
participants  through  the  NIC.  They  are  briefly  described  in  the  following 
paragraphs . 

All  of  the  following  operators  exist  on  Level  I  Real.  "Q"  designates 
a  level  0  storage  location  to  which  a  socket  is  attached  and  in  which  the 
completion  codes  for  operations  involving  that  socket  are  made  available  to 
the  user. 


The  UP  operator  is  employed  to  issue  a  Connect,  Listen,  or  Accept.  It 
requires  a  predicate  list  as  indicated  below: 

UP  Q  £ local- socket  foreign- socket  [,  host-addressJJjRETURN 

If  both  local  and  foreign  socket  numbers  are  specified,  the  operation 
requested  is  Connect  ("host-address"  defaults  to  "3");  if  only  a  local  socket 
number  is  specified,  the  operation  is  Listen;  and  if  neither  is  specified, 
it  is  Accept. 

The  sequence  for  closing  an  existing  connection,  retracting  a  connection 
request,  or  rejecting  a  connection  attempt  is: 

DWN  Q 

The  STORE  operator  causes  data  to  be  transmitted  through  a  send  socket. 


The  predicate  list 
STORE 


$ 


Q  delimiter  text  delimiter 
Q  [x]  value  RETURN 


where  braces  indicate  a  choice  between  several  forms.  The  first  form  causes 


'text"  to  be  transmitted  as  a  character  string.  The  optional  "+"  and 


control  the  formatting  of  the  text  into  logical  messages.  The  second  form 
causes  a  single  byte  whose  value  is  "value"  (specified  in  hexadecimal  if  "X" 
is  present)  to  be  transmitted. 
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LOAD  Is  the  counterpart  of  STORE  and  causes  data  to  be  accepted  free  a 
receive  socket.  Its  format  is: 

Q 


LOAD  \  ”  Q  op-code,  length  RETURN 
*  Q  number- of-bytes  RETURN 


The  first  form  causes  logical  messages  to  be  accepted  from  the  socket,  each 
preceded  by  an  op  code  and  length.  The  op  code  identifies  the  nature  of  the 
data  (e.g.,  characters  for  display,  keys  for  execution).  The  second  form 
allows  the  user  to  manually  specify  the  op  code  and  length  "n"  to  be  applied 
to  the  next  "n"  bytes  received  over  the  connection.  The  third  forms  permits 
up  to  four  bytes  of  data  to  be  accepted  and  placed  in  the  Level  0  accumulator. 

The  DISPLAY  operator  enables  the  user  to  obtain  information  about  any 
local  socket.  It  is  invoked  *n  the  following  manner. 


DISPLAY 


^  (socket-number 


RETURN... 


where  ellipses  designate  items  that  may  be  repeated  any  desired  number  of 
times.  If  the  predicate  list  is  simply  RETURN,  the  status  of  all  sockets 
attached  by  the  user's  terminal  are  displayed.  Otherwise,  information  about 
the  designated  socket  la  displayed;  successive  RETURN'S  display  successively 
higher- numbered  sockets;  successive  BACK'S  display  successively  lower- 
numbered  sockets. 

The  ID  operator  displays  the  number  of  the  socket  attached  to  a  designated 
Level  0  storage  location.  It's  calling  sequence  is: 

ID  Q... 

DEL  closes  all  sockets  attached  by  the  user's  terminal. 
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REFL  sends  an  interrupt  to  the  foreign  socket  attached  to  the  designated 
local  socket.  The  key  sequence: 

REFL  Q 

transmits  the  interrupt. 

Finally,  the  operators  ARG  and  ATAN  provide  a  convenient  means  for 
conducting  bidirectional  communication  with  another  terminal  or  with  an 
executing  program.  The  calling  sequences  are: 

ARG  [.][-]  \  ["]  and 

ATAN  delimiter 

Both  operators  are  applied  to  a  pair  of  local  sockets  (designated  by  and 
Qg)  of  opposite  gender.  Both  simultaneously  monitor  the  receive  socket  and 
user  keyboard  displaying  incoming  data  from  the  Network  connection  on  the 
user's  display  device,  and  transmitting  the  user's  keypushes  through  the 
send  socket.  ARG  provides  a  character-at-a-time  mode  without  local  echo, 
and  ATAN  a  line-at-a-time  mode  (with  "delimiter"  signalling  end  of  line) 
with  local  echo. 

(4)  User  Telnet 

A  User  Telnet  has  been  Implemented  within  OLS  to  provide  OLS  users  with 
access  to  teletype- like  time-sharing  systems  in  the  Network.  The  program 
conforms  to  the  protocol  adopted  by  the  NWG  to  handle  such  communication. 
UCSB's  User  telnet  implementation  is  described  in  detail  in  an  RFC  entitled 
"A  User  Telnet  -  Description  of  an  Initial  Implementation"  (9  August  71, 

NIC  7176)  distributed  to  Network  participants  through  the  NIC.  A  brief 
description  follows  in  succeeding  paragraphs. 
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User  Telnet  resides  in  the  NET  subsystem  of  OLS  on  Level  II  Real  under 
the  operator  LOG.  The  sequence  for  initiating  a  duplex  Network  connection 
to  a  known  system  is  as  follows: 

KEYBOARD  ENTRY  OLS  QUERY/ RESPONSE 

II  LOG  FOREIGN  SITE  NO.  -  (host-address) 

host-address  RETURN  FOREIGN  SCK  NO.  =  (socket- number) 

socket-number  RETURN 

"Host-address"  and  "socket-number"  designate  the  system  to  be  contacted.  When 
the  desired  connection  has  been  established,  OLS  will  erase  the  user's  display 
device  and  position  the  user  to  the  upper  left  corner  of  the  screen.  From 
then  on  out,  the  user  is  effectively  connected  to  the  remote  system.  Keys 
pushed  by  the  user  are  mapped  into  a  character  set  (7-bit  ASCII)  adopted 
as  standard  by  the  NWG.  Incoming  text  from  the  connected,  remote  system  is 
mapped  from  that  character  set  into  a  form  suitable  for  display  at  the  user's 
terminal. 

Ibe  mapping  adopted  between  ASCII  and  teleputer  code  is  briefly  as  follows 
CASE  on  an  OLS  keyboard  substitutes  for  the  CNTRL  key  on  a  teletype;  hence, 
for  example,  "control-C"  is  struck  by  pressing  "CASE  C".  An  alternate  means 
for  sending  control  characters  is  provided  on  the  upper  keyboard;  for  example, 
LOG  is  equivalent  to  "CASE  C".  Punctuation,  numerics,  and  upper-and  lower¬ 
case  alphabetic s  are  mapped  straight -forwardly  into  their  ASCII  equivalents. 
RETURN  and  BACK  are  mapped  into  carriage  return  and  rub  out,  respectively. 

Back  slash  is  mapped  into  "CASE  /",  left  brace  into  "CASE  (";  similar 
transformations  are  applied  to  a  number  of  miscellaneous  punctuation  characters 
Bel  is  simulated  by  displaying  "<BELL?"  in  the  upper  left  corner  of  the  screen. 
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As  a  substitute  for  the  noisy  return  of  a  teletype  carriage  (often  a 
useful  cue),OLS  displays  an  underscore  in  the  left  margin  of  the  display  area. 

Provision  has  been  made  for  allowing  the  user  to  escape  from  User  Telnet 
(by  hitting  a  level  key)  and  then  to  return  to  the  connected  remote  system 
without  penalty.  Thus  the  user  can,  for  example,  escape  from  telnet  and 
select  the  printer  as  a  secondary  output  device,  and  then  return  to  Telnet  to 
obtain  hardcopy  of  all  succeeding  dialogue  with  the  remote  system. 

The  key  sequence  "CCN  RETURN"  closes  the  user's  Telnet  sockets.  If 
the  sockets  haven't  been  closed  by  the  user  before  he  logs  off,  that  function 
is  performed  for  him  by  OLS. 

As  an  adjunct  to  Telnet,  the  user  is  provided  with  an  operator  for 
displaying  the  list  of  hosts  known  to  the  NCP.  For  each  such  host,  the 
following  information  is  displayed:  a  brief  mnemonic  for  the  host  (e.g., 
"UTAH",  "RAND",  "UCSB");  its  host  address  in  decimal,  hexadecimal,  and  octal; 
and  the  state  of  the  host  (dead  to  the  Network  or  otherwise).  The  list  is 
generated  by  pushing: 

ID  RETURN. . . 

Each  successive  RETURN  erases  the  display  device,  repositions  it  to  the  upper 
left  corner,  and  displays  another  segment  of  the  list,  until  the  last  hosts 
have  been  displayed. 

(c)  Access  to  Local  Programs  From  the  Network 

A  number  of  programs  were  written  during  this  contract  period  for  use 
by  users  at  other  Network  sites.  Some  are  interfaces  to  existing  software 
systems  at  UCSB;  others  are  service  programs  written  expressly  for  Network 
These  programs  are  described  in  the  following  sections. 


users . 
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All  currently  existing  programs  of  this  type  are  executed  in  the  same 
region  as  the  NCP.  A  subroutine  of  the  NCP  called  the  Dispatcher  was  written 
to  oversee  the  execution  of  such  service  programs  -  fetch  copies  of  them  from 
secondary  storage  when  requested  by  remote  users  or  processes,  and  release 
the  main  storage  they  occupied  when  they  terminate  execution.  Essentially, 
the  Dispatcher  is  aware  of  a  set  of  local  sockets  for  which  it  has 
responsibility.  With  each  such  socket,  it  associates  the  name  of  a  partitioned 
data  £et  (PDS)  and  the  name  of  a  member  of  that  PDS  containing  the  load 
module  for  the  program  associated  with  the  socket.  Hie  Dispatcher  always  has 
a  Listen  pending  to  each  socket  in  its  list.  When  a  foreign  process  attempts 
to  connect  to  the  socket,  the  Dispatcher  invdkes  the  appropriate  program  by 
fetching  its  load  module  from  the  PDS  and  attaching  it  as  a  subtask,  and  then 
passes  control  of  the  socket  to  it.  While  the  program  is  attached,  the 
Dispatcher  ceases  listening  on  that  socket;  as  soon  as  the  program  terminates 
execution,  the  Dispatcher  re-issues  its  Listen.  In  this  manner,  any  number 
of  processes  can  be  supported;  a  single-card  change  to  the  software  adds  to 
or  deletes  a  socket  (and  hence  a  service)  from  the  Dispatcher's  list.  The 
Dispatcher  itself  is  transparent  to  the  calling,  foreign  socket. 

(1)  On-Line  gystem,  Graphics  Supported 
Two  processes  were  written  during  this  contract  period  to  provide 
Network  users  with  full  access  to  the  On-Line  System.  Both  included  provision 
for  transmitting  curvil*  .tar  display  in  line,  dot,  dot-dot,  and  character 
modes,  and  the  first  for  transmitting  special  character  display.  The  first 
such  implementation  was  made  available  to  Network  users  shortly  after  the 
adoption  by  the  NWG  of  its  initial  Host-Host  protocol  specification,  and  was 
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never  used  by  any  but  local  users.  It  was  fully  documented  in  an  RFC 
entitled  "Specifications  for  Network  Use  of  the  UCSB  Or  '  ine  System  (16 
October  70,  NIC  5^17)  distributed  to  Network  participants.  Support  for  this 
specification  was  terminated  by  UCSB  this  August,  and  replaced  by  generation 
of  a  new  specification  to  which  the  Rand  Corporation  in  Santa  Monica  has  already 
interfaced  its  video-graphics  system.  Hence,  users  at  Rand  are  able  to 
simulate  an  OLS  keyboard  by  using  a  light  pen  or  Rand  tablet  to  select 
squares  on  a  pick  table  displayed  on  a  television  screen.  An  area  of  the 
screen  is  reserved  as  a  simulated  OLS  display  screen.  This  second  implementa¬ 
tion  does  not  yet  support  special  character  display,  but  the  inclusion  of 
that  feature  is  scheduled  for  the  immediate  future.  Also,  documentation  of 
this  second  implementation  has  yet  to  be  made  available  to  the  Network 
canmunity. 

A  brief  description  of  UCSB's  first  implementation  of  an  interface  to 
its  On-Line  System  is  now  presented.  This  implementation  simultaneously 
supported  up  to  ten  Network  users.  A  user  connected  himself  to  OLS  by  a 
hand- shaking  procedure  almost  identical  to  that  which  has  since  been  adopted 
by  the  NWG  as  the  official  Initial  Connection  Protocol  (ICP).  Keypushes, 
represented  in  OLS's  internal  character  set,  were  transmitted  by  the  user  on 
one  Network  connection.  Software  in  the  user's  host  was  responsible  for 
providing  him  with  a  means  of  generating  those  codes  from  his  terminal. 
Character,  curvilinear,  and  special- character  output  were  returned  by  OLS 
on  a  second  connection  in  the  form  of  variable-length  logical  messages,  each 
prefixed  with  op  code  and  length  fields.  An  op  code  was  defined  for  each 
of  the  three  classes  of  output.  Text  was  transmitted  in  the  Extended-Binary- 
Coded-  Decimal  Interchange  Code  (EBCDIC),  with  carriage  control  characters 
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embedded  in  the  text.  Curvilinear  display  was  transmitted  as  lists  of 
normalized,  fixed-point  x-y  coordinate  pairs,  each  coordinate  in  the  range 
[0,  4095 1  and  represented  in  sixteen  bits.  Flags  indicating  the  display  mode, 
and  the  display  character  if  character  mode.,  were  also  made  available.  Finally, 
special  character  display  vas  represented  in  the  following  manner.  The 
display  area  vas  assumed  4096  units  square,  and  a  character  frame  160  units 
vide  and  224  units  high.  Codes  were  defined  for  moving  the  beam  in  any  of 
sixteen  directions  a  distance  equal  to  that  of  a  move  in  the  specified 
direction  from  the  center  of  a  64- unit  square  to  its  perimeter.  A  change  in 
position  relative  to  the  lover-right  corner  of  the  last  character  frame  vas 
conminicated  by  providing  the  change  along  th»  x  and  y  axes  as  tvo  12-bit 
signed  integers. 

In  that  implementation,  the  user  vas  interfaced  to  OLS  at  natural  points 
in  the  softvare.  Input  from  Network  users  vat  _.*aced  in  a  buffer,  as  if  its 
source  were  a  segment  of  the  hardware  interface  to  local  users,  and  OLS 
posted  via  the  OS  post  macro  instruction.  Output  from  OLS  vas  intercepted 
between  the  terminal  display  subroutine  of  the  Basic  system  and  the  I/O  module. 
In  OLS  terminology,  the  Network  softvare  played  the  role  of  an  Output 
Enqueue  routine. 

The  second  implementation  of  a  graphics  interface  vas  the  product  of 
a  joint  experiment  with  Rand  Corporation  which  vas  initiated  in  response  to 
requests  made  for  pairwise  experiments  at  the  Network  Graphics  meeting  held 
in  Boston.  This  implementation  vas  carried  out  by  establishing  a  set  of 
graphic  orders  and  writing  a  display  encoder  which  builds  strings  of  orders 
representing  the  on-line  display.  The  set  of  orders  are: 
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1)  <£ERASE> 

2)  <origin>  <x,y> 

3)  <vector>  <x,y> 
U)  <Dot>fx,y> 

5)  <shlft>  <Size> 


Erases  the  screen. 

Positions  the  beam  at  location  <x,y> 

Draws  a  vector  from  current  beam  position  to  the  point  <x,y> 
Positions  beam  at  location  <x,y>  and  displays  a  small  dot. 
Shifts  character  size  to  that  specified. 


All  orders  are  8  bits  in  length.  Coordinate  information  is  presented  as 
l6  bit  pairs  in  unsigned  binary  fraction  format,  (ie.,  0*x<l,  0^y<l).  It  is 

assumed  that  characters  can  be  displayed  at  any  location  on  the  display  screen 
and  the  origin  is  in  the  center  of  the  character  frame.  Any  number  of 
character  sizes  up  to  256  is  supported.  All  carriage  control  characters  are 
trapped  before  transmission  and  replaced  with  origin  commands.  Beam  location 
is  recorded  at  our  end  of  the  connection  and  hence  any  sequence  of  consecutive 
carriage  control  characters  can  be  replaced  with  one  origin  command.  Any 
character  in  the  output  string  other  than  those  in  the  set  of  orders  is 
assumed  to  be  a  printable  character  and  only  those  characters  in  the  96- 
character  ASCII  set  are  transmitted.  EBCDIC  is  used  in  plane  of  ASCII  code 
for  convenience  since  both  Rand  and  UCSB  have  IBM  systems. 

<2)  Batch  Processing 

UCSB's  batch  processing  facilities  were  made  available  to  Network  users 
during  this  contract  period.  Two  independent  processes  were  written  to 
provide  this  service,  each  addressed  by  a  standard  ICP  to  a  separate  socket. 

One  supports  remote  ^[ob  entry  (RJE)  by  accepting  files  of  card  images  from 
the  Network  and  transmitting  them  to  UCSB's  Houston  Automatic  Spooling  Priority 
System  (HASP)  through  an  internal  reader.  The  second  retrieves  output  from 
remotely  submitted  Jobs  from  a  PDS  and  relays  it  to  the  Network  user.  These 
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facilities  have  been  in  use  on  a  production  basis  by  user  groups  at  the  Rand 
Corporation  in  Santa  Monica  for  a  number  of  months,  and  are  fully  documented 
in  an  RFC  entitled  "Network  Specifications  for  Remote  Job  Entry  and  Remote 
Job  Output  Retrieval  at  UCSB"  (22  March  71,  NIC  5775)  distributed  to 
Network  participants  through  the  NIC.  Brief  descriptions  are  included  below. 

Complete  card  files  are  transmitted  to  UCSB  through  the  Network,  and 
are  required  to  include  all  the  usual  Job  £ontrol  Language  (JCL)  cards.  In 
addition,  to  enable  subsequent  retrieval  of  what  would  normally  be  printed 
output,  a  special  accounting  parameter  must  appear  on  the  user's  job  card. 

UCSB  provides  the  Network  user  with  three  options  for  formatting  his  card 
file.  The  first  option  enables  the  user  to  transmit  his  file  as  a  succession 
of  variable  length  card  images,  each  delimited  by  a  break  character  of  his 
choosing.  The  second  option  allows  each  card  image  to  be  preceded  by  a  16-bit 
length  field.  In  either  case,  UCSB  pads  each  record  on  the  rignt  or  truncates 
it  to  80  characters  wherever  required.  The  third  format  requires  that  a 
file  be  transmitted  as  80-character,  fixed- length  records. 

In  retrieving  output  from  RJE  Jobs,  the  user  is  provided  with  the  follow¬ 
ing  options.  The  user  can  request  that  the  output  file  generated  by  a  job 
be  returned  if  currently  available,  or  that  it  be  transmitted  as  soon  as  it 
ib  available.  He  may  request  that  a  previously- made  output  request  be 
cancelled.  Finally,  he  may  request  that  an  output  file  be  purged.  In  all 
cases,  UCSB  returns  a  completion  code  indicating  the  disposition  of  the  user's 
request,  accompanied  by  the  name  of  the  job  to  which  the  request  applied. 

The  output  file  itself  is  always  transmitted  as  a  series  of  variable- length 
logical  records,  each  preceded  by  an  op  code  and  length  field. 


UCSB  forces  serial  access  to  both  of  the  processes  described  above, 
one  of  which  uses  an  inherently  serial  resource  -  an  internal  reader.  The 
greatest  shortcoming  of  this  service  lies  in  the  difficulty  of  providing  the 
user  with  status  information  about  his  .lob  in  the  presence  of  HASP. 

(3)  File  Storage 

During  this  contract  period,  on-line  direct-access  storage  at  UCSB 
was  made  available  to  Network  users  through  a  process  written  expressly  for 
Network  use  and  known  as  UCSB's  Simple-Minded  File  Jystem  (SMFS).  This 
process  has  been  fully  documented  in  an  RFC  entitled  "Network  Specifications 
for  UCSB's  Simple-Minded  File  System"  (26  April  71,  NIC  5834)  distributed  to 
the  Network  community  through  the  NIC.  A  brief  description  of  the  system 
follows  in  succeeding  paragraphs. 

At  pres-  nt,  one  IBM  2316  disk  pack  with  a  capacity  of  29M  8-bit 
characters  and  one  2314  disk  drive  have  been  dedicated  to  Network  use.  SMFS 
is  written  to  support  any  number  of  on-line,  dedicated  drives  without  modifica¬ 
tion.  Any  number  of  Network  users,  up  to  some  assembly-parameter  maximum 
(currently  ten)  can  be  serviced  simultaneously  by  SMFS. 

SMFS  provides  storage  for  sequential,  binary  files,  each  characterized 
by  a  36-character  filename  and  two,  optional,  36-character  passwords,  one 
regulating  retrieval  of  the  file,  the  other  regulating  its  modification  or 
deletion.  SMFS  permits  filenames  and  passwords  to  be  specified  by  the  user 
in  either  EBCDIC  or  ASCII.  Each  file  is  stored  by  SMFS  as  a  single  OS  data 
set.  The  more  important  file  operations  supported  are  described  below. 

SMFS  regards  the  reservation  of  filename,  the  assignment  of  passwords, 
and  the  reservation  of  secondary  storage  as  an  operation  distinct  from  that 


25 


of  transmitting  the  file’s  contents.  The  operation  is  called  file  allocation 
and  requires,  in  addition  to  the  specification  of  filename  and  passwords, 
that  an  upper  bound  on  the  file's  size  be  declared. 

An  operation  called  file  update  causes  a  bit  string  transmitted  by  the 
user  to  be  logically  concatenated  to  the  current  contents  of  the  file. 

Boundaries  between  successive  update  operations  are  transparent  to  the  user 
when  the  file  is  retrieved.  A  similar  operation  celled  file  replacement 
causes  the  transmitted  bit  string  to  replace  the  entire  contents  of  the  file 

An  uperation  called  file  retrieval  causes  a  segment  of  the  file  of 
specified  length  to  be  returned  to  the  user.  Successive  retrieve  operations 
return  successive  segments  of  the  file.  A  similar  operation  celled  file  space 
causes  advancement  through  the  file  without  transmission  of  data. 

An  operation  celled  file  deletion  purges  a  file,  releasing  the  filename 
and  permitting  SMFS  to  reclaim  the  secondary  storage  allocated  to  the  file. 

Finally,  an  operation  celled  file  rename  permits  re-assignment  of  filename. 

(4)  On-Line  System,  Server  Telnet 

Finally,  during  this  contract  period,  an  interface  to  the  On-Line  System 
was  created  to  function  within  the  constraints  of  the  Telnet  protocol  adopted 
by  the  NWG  to  handle  teletype- like  ccmnunication.  The  nature  of  this  protocol 
requires  that  curvilinear  and  special- character  display  be  discarded,  and 
hence  only  character  display  is  made  available  to  the  user.  However,  no  other 
limitation  is  Imposed  upon  use  of  OLS  through  this  Telnet  implementation 
(of  course,  that  one  limitation  is  in  itself  severe).  In  particular,  a 
means  is  provided  for  simulating  every  key  on  an  OLS  keyboard.  This  process 
is  addressed  as  socket  number  1  in  accordance  with  Telnet  protocol,  and  supports 
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simultaneously  any  number  of  Network  users,  up  to  sane  assembly- parameter 
maximum  (currently  five).  The  specifications  for  this  service  are  yet  to  be 
published;  a  brief  description  follows: 

OLS  keyboards  have  more  keys  than  there  are  characters  in  Telnet's 
7-bit  ASCII  character  set.  A  fairly  natural  correspondence  can  be  established 
between  elements  of  the  character  set  and  keys  on  the  lower  or  operand  key¬ 
board.  Keyr  'n  the  upper  or  operator  keyboard  are  represented  in  the  follow¬ 
ing  manner.  Each  is  designated  by  a  character  string  suggestive  of  the  key's 
function,  preceded  by  a  semi-colon  and  followed  by  a  space.  For  example, 
the  SYST  key  is  designated  ";SYSTb".  The  character  string  may  be  specified 
in  either  upper-  or  lower-case  and  may  be  abbreviated  to  any  extent  desired, 
ae  long  as  the  key  indicated  remains  unambiguous.  Hence,  ";SYb"  is  an 
adequate  representation  of  SYST.  A  semi-colon  is  represented  as  If 

a  semi-colon  appears  anywhere  within  a  character  string  representing  an 
upper  keyboard  key  (after  at  least  one  other  character),  all  previous 
characters  (excluding  the  semi-colon)  are  ignored;  typing  errors  can  thus 
be  corrected.  A  carriage  return  anywhere  within  such  a  string  has  the  same 
effect  as  a  second  semi-colon,  but  the  initial  semi-colon,  too,  is  ignored; 
hence  the  next  character  entered  is  treated  as  designating  a  lower-keyboard 
key.  As  it  becomes  clear  to  OLS  that  the  character  string  being  entered  does 
not  designate  a  key  on  the  upper  keyboard,  question  marks  are  echoed  and  the 
erroneous  characters  ignored.  Hitting  Alt  Mode  or  "?"  anywhere  within  the 
string  causes  the  remaining  characters  of  the  keyword  to  be  echoed,  provided 
the  keyword  is  uniquely  identified;  otherwise,  the  bell  is  rung. 

A  number  of  pseudo-upper-keyboard  keys  are  defined,  and  all  of  the 
conventions  for  their  entry  are  as  described  in  the  preceding  paragraph. 
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One  such  pseudo  operator  -  "SHIFT"  -  causes  the  case  of  all  alphabetics  to 
be  changed;  "A"  is  translated  to  "a"  and  "a"  to  "A",  etc.,  before  being 
processed  by  OLS.  The  pseudo-operator  "UNSHIFT"  nullified  the  effect  of 
"SHIFT".  The  pseudo  operators  "HALFDUPLEX"  and  "FULLDUPLEX"  disable  and 
enable,  respectively,  echoing  of  characters  entered  by  the  user.  The 
pseudo- operator  "PREFIX"  causes  the  character  following  it  to  become  the 
prefix  character,  replacing  The  pseudo-operator  "STATE"  causes  the 

current  prefix,  the  console  mode  (half-  or  full-duplex),  and  the  shift  state 
(on  or  off)  to  be  displayed.  Finally,  the  pseudo- operator  "HELP"  causes 
the  entire  set  of  operator  and  pseudo- ope rat or  keywords  to  be  listed  for  the 
user. 

(d)  Network  Pie- In- Ihe- Sky 

A  number  of  Network- related  projects  are  scheduled  for  the  near  future. 

A  Data  Reconfiguration  Service  (DRS)  is  currently  being  implemented  in 
cooperation  with  the  Rand  Corporation.  A  DRS  is  planned  for  implementation 
on  several  hosts,  each  such  host  being  one  with  substantial  computing  power. 
Essentially,  the  DRS  is  a  Network  process  which  services  terminal  users, 
communicating  with  them  via  the  Telnet  protocol.  The  DRS  permits  a  user  to 
define  in  a  simple  programming  language,  transformations  to  be  applied  to  the 
data  passing  through  a  pair  of  Network  connections.  Character  set  translations, 
insertion  of  literals,  and  transposition  of  fields  are  among  the  more  simple 
operations  expressable  in  the  language.  The  user  is  permitted  to  name,  store, 
retrieve,  and  apply  to  specified  connections,  the  programs  so  defined.  The 
intent  is  to  provide  users  at  hosts  with  small  amounts  of  computing  power 
(Terminal  Interface  Processors  (TIP's)  are  important  examples  of  such  hosts) 
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with  a  means  for  communicating  with  those  specialized  Network  services  which 
do  not  conform  to  one  of  the  standard  protocols  supported  by  the  users'  host 
software. 

The  DRS  being  implemented  by  UC3B  and  Rand  has  three  components:  a 
compiler  which  reduces  DRS  source  programs  to  a  simpler,  intermediate 
language,  an  interpreter  which  executes  the  program  output  by  the  compiler, 
and  a  front-end  which  interfaces  to  the  user.  The  first  component  is  being 
written  by  Rand  in  PL/l,  the  second  and  third  by  UCSB  in  assembly  language. 

UCSB's  is  also  investigating  the  possibility  of  Implementing  a  full¬ 
blown  remote  job  entry  service  to  replace  the  rudimentary  service  currently 
provided.  Such  an  implementation  requires  the  cooperation  of  HASP,  and  hence 
the  software  is  more  involved. 

Of  course,  protocol  changes  which  affect  the  NCP  will  be  implemented 
as  required.  And,  as  higher- level  protocols  are  adopted  to  standardize 
such  services  as  remote  job  entry  and  file  storage,  existing  UCSB  software 
will  be  modified  accordingly. 

II.  UCSB  On-Line  System 

Nearly  all  on-line  system  software  development  was  completed  during  the 
first  half  of  the  contract  year  as  stated  in  the  last  technical  report. 

Many  features  of  lesser  importance  were  implemented  during  the  second  half 
along  with  two  major  additions  which  are  described  below. 

It  has  become  apparent  from  recent  conversation  with  interested  parties 
that  exportation  of  the  UCSB  On-Line  System  is  constrained  by  two  major 
characteristics.  First,  the  On-Line  System  can,  in  some  respects,  be 
considered  a  "core  hog".  Whereas  many  installations  have  a  main- frame 
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sufficient  in  capacity  to  implement  our  system,  most  are  not  in  a  position 
to  allocate  a  main  storage  region  of  the  magnitude  necessary  to  support  a 
user  population  equivalent  to  ours.  To  do  this  would  mean  curtailing  other 
user  services  which  an  installation  has  strongly  committed  itself  to  provide. 
Hence,  we  have  implemented  a  swapping  facility  which  should  drastically 
reduce  the  main  storage  requirements  for  a  given  number  of  users.  We 
anticipated  well  in  advance  that  this  may  someday  become  necessary  and  took 
steps  to  provide  for  its  later  addition.  Therefore  the  implementation  of 
rollout  was  relatively  straight  forward  and  was  accomplished  in  a  short 
period  of  time.  The  On-Line  System  is  basically  a  core-resident  system. 

When  a  user  begins  a  session,  his  data  resides  on  a  direct  access  volume.  As 
he  makes  references  to  certain  elements  of  his  data  base,  those  data  elements 
are  brought  into  main  storage  where  they  remain  until  he  invokes  routines  to 
cause  their  return  to  disc.  However,  when  contention  for  main  storage 
reaches  a  predetermined  threshold,  the  user  occupying  the  most  main  storage 
is  swapped  back  to  disc  and  the  cycle  repeats  itself.  At  this  writing 
adequate  measurements  have  not  been  made  which  would  indicate  to  what  extent 
response  time  is  affected  under  certain  conditions.  However,  experiments 
have  been  conducted  using  a  l60K  work  space  shared  among  12  users  operating 
under  worst-case  conditions  for  which  response  time  was  not  affected  by  an 
intollerable  amount. 

The  second  modification  to  the  On-Line  System  concerns  the  manner  in 
which  it  time  shares.  The  question  often  asked  was  how  well  would  the  UCSB 
On-Line  System  cooperate  in  an  environment  consisting  of  many  high-priority 
interactive  computing  systems  —  the  answer  being  "not  very  well".  This  has 
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never  been  a  major  consideration  locally  since  we  are  the  sole  resident  of 
an  interactive  community.  However,  it  has  become  quite  obvious  that  the 
situation  must  be  alleviated  if  the  On-Line  System  is  to  be  truly  exportable. 

We  have  proceeded  on  the  premise  that  interactive  systems  should  police 
themselves;  ie.,  such  systems  should  guarantee  that  they  will  not  use  more  than 
a  given  percentage  of  the  central  processor  over  a  small  increment  of  time. 
Further,  this  increment  must  be  small  enou  ;h  to  allow  all  interactive  systems 
within  the  community  to  respond  quickly  to  simple  user  requests.  In 
actuality,  this  is  the  only  reasonable  approach  since  it  is  vexy  difficult 
to  police  the  activity  of  other  interactive  systems  running  under  0. S.  with 
no  knowledge  of  their  mechanism  of  time  sharing,  particularly  if  they  have 
a  higher  priority.  Hence,  the  time  sharing  monitors  were  changed  to  measure 
CPU  utilization  during  back-logged  states  and  to  assure  that  this  utilization 
does  not  exceed  a  given  threshold.  This  threshold  may  be  set  by  the  operator 
or  allowed  to  vary  according  to  a  pre-established  profile.  Thus,  the 
percentage  utilization  may  be  allowed  to  increase  during  certain  hours  of 
the  day  when  other  computing  activity  is  light. 

The  On-Line  System  time  shares  with  other  processes  by  alternating  its 
dispatching  priority  between  its  normal  value  (high)  and  zero  (lowest  possible 
priority).  Setting  the  dispatching  priority  to  zero  has  the  same  effect  as 
going  into  the  wait  state  in  that  lower  priority  processes  may  get  CPU 
cycles.  However,  there  is  the  added  advantage  that  if  no  other  lower  priority 
tasks  axe  contending  for  the  CPU,  the  On-Line  System  will  continue  executing 
(ie.,  it  is  the  only  low  priority  process  contending  for  processor  time). 

The  intervals  of  time  during  which  the  On-Line  System  executes  in  a  high  or 
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low  priority  state  are  computed  as  follows:  Normally  OLS  takes  a  100 
millisecond  time  slice  (task  time)  in  the  high  priority  state.  If  P  denotes 
the  maximum  percentage!  CFJ  utilization  for  OLS  and  H  denotes  the  high  priority 
time  slice,  then  the  interval  of  time  (L)  for  which  OLS  executes  at  low 
priority  is  computed  as  L  =  —  .  However  this  formula  does  not  take 

into  account  cycles  lost  to  higher  priority  processes.  To  allow  for  this, 
we  must  subtract  that  amount  lost  from  the  interval  we  would  normally  execute 
at  low  priortiy.  In  more  precise  terms  the  algorithm  may  be  stated  as  follows: 
Let  one  full  cycle  be  defined  as  one  interval  in  low  priority  (Ln)  followed 
by  one  interval  in  high  priority  (Hn),  Ho  be  the  normal  high  priority  time 
slice  (100  ms),  and  En  be  the  elapsed  real  time  of  the  nth  cycle.  We  now 
define 

(l-P)H  +  Ln-En 
Ln+1  "  P 

P  L 

If  Ln+1  <  0  then  Hn  is  extended  by  — —  or: 

Hn  «  Hn  -  (•—)  Ln+1 

_  P  (En-Ln) 

1-P 

Otherwise,  if  Ln+1  *  0,  then  Hn+1  is  set  to  Ho. 

While  changing  the  time- sharing  monitors  it  was  also  a  convenient  time 
to  upgrade  the  internal  scheduling.  Hence,  we  have  replaced  the  previous 
"round  robin"  scheduler  with  a  priority  scheduler.  This  was  done  by 
simply  allowing  a  hierarchy  of  "round  robin"  queues  which  are  classified  as 
follows : 


i 


Mi 


mmm 
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Q1  -  special  system  functions 

Q2  -  simple  user  requests 

Q3  -  user  programs  (foreground  mode) 

0^  -  User  programs  (background  mode) 

The  relationship  of  priorities  with  batch  programs  is  expressed  as 
P(Q1)  ^  P(Q2)>  P(Q3)  >  P(BATCH)>  P(Q4)  =  0 

The  significance  of  executing  user  programs  in  background  mode  is  that  such 
programs  get  processor  time  only  when  all  other  computer  activity  has  quieted. 
However,  users  running  in  this  mode  will  not  be  charged  for  processor  time. 

This  should  encourage  users  to  run  at  times  when  computing  activity  is 
normally  light. 

The  two  preceding  additions  are  still  in  the  implementation  phase  and 
should  become  part  of  the  prime-time  system  within  the  next  month.  Most 
development  effort  during  the  ensiling  contract  year  will  be  directed  towards 
network  activities.  At  this  writing,  the  only  planned  addition  to  OLS  will 
be  BUTTRAN  (BUTTON  FORTRAN),  a  mechanism  for  including  user  Fortran  subroutines 
as  OLS  basic  operators. 

B.  360/75  On-Line  System,  Hardware 

Maintenance  and  functioned,  improvement  of  the  UCSB  IMP-HOST  Interface 
has  continued  over  the  last  year.  Design  improvements  have  been  incorporated 
in  a  newly  designed  I.C.  version  of  the  interface  for  360  operation.  Severed 
ARPA  sites  are  interested  in  using  the  new  interface,  one  of  which  is 
presently  operational  at  MIT  Lincoln  Laboratory. 


Hardware  assistance  in  support  of  the  UCSB  On-Line  Eastern  (QLS)  has 
resulted  in  the  development  of  a  new  Multi-Line  Controller  and  a  new  version 
of  the  graphics  display  consol*"  used  on  the  QLS.  The  Multi-Line  Controller 
(MLC)  will  allow  the  attachment  of  any  type  of  user  equipment  to  our  360 
system.  The  MLC  may  find  use  at  severed  AKPA  sites  in  the  near  future.  The 
new  graphics  terminal  will  allow  a  user  to  gain  access  to  the  OLS  and  to  the 
AKPA  Network  at  large  by  means  of  an  acoustic  coupler.  Design  of  the  new 
console  has  incorporated  the  newest  in  storage  oscilloscopes  and  keyboards, 
and  now  provides  the  user  with  a  most  reliable,  inexpensive,  graphics  (and 
alphanumeric)  console. 

The  high-speed  serial  data  link  between  the  UCSB  Network  Host  (IBM  360) 
and  the  speech  analysis  computer  (SEL-810)  has  been  designed  and  all  purchasing 
has  been  completed.  This  link  will  be  made  operational  in  the  next  quarter, 
and  will  provide  the  hardware  to  link  our  speech  system  to  the  network. 

Upon  completion  of  software  development  our  speech  system  could  support 
network  users.  The  link  will  transfer  500  Kilobit/sec  serial  data  between 
the  two  sites  and  will  allow  real-time  storage  of  voice  data  at  the  360. 


C.  Speech  Project 

The  speech  research  effort  at  UCSB  to  date  has  been  directed  toward  the 
following  areas: 

1.  Development  of  a  preprocessing  method  to  filter  the  raw-speech  string 
into  sub-strings  amenable  to  wave  function  analysis. 

2.  Development  of  a  one  pass  wave-function  analysis/synthesis  system 
baaed  on  the  cosine  modulation  methods  that  will  accurately  analyze  and 
synthesize  both  male  and  female  speech  data. 

3.  Continued  studies  on  the  computer  classification  and  recognition  of 
phonetic  information  including  extraction  of  recognition  parameters  from  the 
ASC/0F  parameter  set,  single  speaker  recognition  of  (a)  steady  state  vowels, 

(b)  vowels  embedded  between  two  unvoiced  phonemes,  and  (c)  unvoiced  phonemes, 
and  detailed  studies  of  the  segmentation  of  connected  voiced  and  unvoiced 
phonemes . 

4.  Studies  of  the  data  rate  of  the  basic  ASCjfiF  representation  and  the 
amount  of  data  compression  possible  through  (a)  the  elimination  of  redundant 
wave  function  sets,  (b)  coarse  quantization  of  parameter  values ,  and  (c) 
encoding  of  parameter  sets. 

The  above  topics  are  discussed  in  more  detail  in  the  subsequent  paragraphs, 
(a)  Preprocessing  of  the  Acoustic  Waveform 

(30)* 

As  was  discussed  in  the  Semiannual  Technical  Report  the  acoustic 
waveform  must  be  preprocessed  in  some  appropriate  manner  prior  to  the  analysis 
of  the  waveform  into  its  set  of  wave  function  parameter  elements.  The 
preprocessing  operation  must  convert  the  raw  human  speech  into  a  form  that  is 


♦Superscripts  refer  to  "List  of  Publications  and  Reports"  at  the  beginning  of 
this  report . 
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amenable  to  wave  function  analysis  and  that  minimizes  the  data  rate  into  the 
wave  function  analysis  procedure. 

The  above  requirements  can  be  satisfied  by  correct  prefiltering  of  the 
acoustic  waveform  into  contiguous  bandpass  frequency  bands  of  filtered  data 
and  conversion  of  the  filtered  data  to  an  extrema  format  which  retains  the 
mininrom  information  necessary  to  describe  the  filtered  human  speech. 

Efficient  algorithms  for  these  preprocessing  steps  have  been  developed 
and  implemented  on  the  IBM-l800  speech  system.  In  addition  these  algorithms 
are  presently  being  implemented  as  part  of  the  IBM  360/75  network  speech 
facility. 

Octave  Band  Filtering 

The  filtering  algorithm  which  has  been  developed^ is  a  high  speed 
octave  band,  time  domain  convolutions  method  of  filtering  which  provides 
optimum  filtering  characteristics  and  minimum  data  rates  to  the  wave- function 
analysis  system.  The  method  is  essentially  a  fast  non-recursive  digital 
filtering  technique  in  which  over  three-fourths  of  the  coefficients  of  the 
filter  impulse  response  are  forced  to  be  zero  by  the  Judicious  choice  of  filter 
center  frequency,  band  width,  and  window.  Nearly  half  of  the  remaining 
coefficients  of  the  filter  kernel  can  be  discarded  by  taking  advantage  of 
the  symmetry  of  the  impulse  response. 

The  filter  kernel  r(nT)  which  is  employed  in  the  filtering  process  is 
given  by 


r(nt)  =  B  w(nT)  sin^ 


Trn 

T 


COS  TTn  = 
2 


-NStfSN 


(1) 


WCn) 
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Note  that  Eq.  (l)  represents  an  ideal  bandpass  filter  modulated  by  a  window 
function  w(nT).  B  is  a  gain  normalization  factor  and  w(nT)  a  window 
function  described  by  a  Gaussian  times  a  Hanning  window  as  shown  in  Eq.  (2) 

w(nT)  =  [exp[-£(^pl)  ]J[l+cos—^J  m\ 

k*0  (2) 

The  parameter  t  in  Eq.  (2)  is  the  time  width  of  the  window.  In  order  that 

the  end  values  of  the  window  coincide  with  sample  points  a  value  of  TcNT  is 
employed.  The  resulting  window  is  independent  of  the  sampling  interval  T 
as  indicated  in  Eq.  (3). 

2 

w(n)  =  exp  ]  Ll+cos  (— —)]  k*0  (3) 

When  N=2+4i  (i=0,l,2, .  .  .)  the  zero  end-points  of  the  window  will  coincide 
with  non- zero  values  of  the  kernel.  This  maximizes  the  effective  time 
width  of  the  window  while  further  reducing  the  number  of  non- zero  coefficients 
required  to  represent  the  kernel. 

Figure  1  shows  the  "minimum  coefficients"  kernel  r(nT)  for  N»l8.  Note 

that  only  five  non-zero  coefficients,  indicated  by  -the  heavy  arrows  in 

Figure  1,  are  required  to  represent  a  37  coefficient  inplus e  response. 

If  a  signal  sampled  at  a  sampling  frequency  f  is  bandlimited  at 

s 

f  ■  f  Jh  the  resulting  data  string  [x(j)3  can  be  filtered  into  octave  bands 
using  a  single  minimum-coefficients  kernel.  The  steps  required  to  accomplish 
this  (illustrated  as  magnitude  frequency  spectra  in  Figure  2)  are  as  follows: 
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a.  Perform  a  minimum-coefficients  convolution  on  the  input  data 
string  [x(j)L  The  output  sequence  [y(j)3  is  an  octave  band.  (Figure  2c) 

b.  If  the  next  lower  octave  band  is  desired,  go  to  (c),  otherwise 
termin' te. 

c.  Subtract  the  filter  output  [y(j)3  point-by-point  from  the  original 
data  sequence  to  obtain  the  "residue  data",  which  is  bandlimited  at 

f  -  f J2.  (Figure  2d) 

d.  Decimate  the  "residue  data"  by  throwing  out  every  other  point. 

This  new  sequenceCx'  (j )]  has  half  as  many  elements  as  t  ie  "original  data" 

and  is  effectively  sampled  at  f  f  /2. 

s  s 

e.  Replace  [x(j)3  with  Cx'(j)3  and  go  to  (a). 

One  advantage  of  the  above  method  is  that  the  time  required  to  filter 
out  each  successive  octave  band  is  one  half  the  time  of  the  previous  band. 
(See  (d)  above).  The  total  time  to  filter  the  "original  data"  into  M 
bands  is  thus 

r  M-l] 

^“‘hL1*****-4^  J 

-  ^  t1-^  1 

where  th  is  the  time  to  filter  out  the  highest  octave  baud.  However  two 
possible  limitations  of  the  approach  are: 

a.  The  data  strings  for  successive  bands  are  not  the  same  length  and 
hence  cannot  be  directly  summed. 

b.  All  bands  are  effectively  sampled  at  twice  the  Ifyguist  frequency 
vuich  may  not  provide  sufficient  data  resolution. 


XCf) 


before  uccination.  . 

(e)  Response  of  an  SMC  filter  stretched  once,  i.e.  K  -  1.  Note  that 
the  unwanted  spectral  repetita on  occurs  where  the  undecimated 
residue-data  is  essentially  zero. 


These  problems  can  be  obviated  by  employing  a  "stretched"  minimum 
coefficients  kernel  coupled  with  data  decimation.  In  a  "stretched"  kernel 
all  non- zero  values  are  unchanged,  but  are  located  as  if  Figure  1  were  printed 
on  a  rubber  sheet  and  then  stretched  uniformly  to  the  right  until  the  end 
point,  r(N),  is  located  as  though  N  2^  (K=0,l,2,3  ).  Thus  if  all  octave 

bands  must  have  the  same  number  of  elements  as  the  original  data  a  stretched 
minimum  coefficient  kernel  with  is  used  to  remove  the  highest  octave 
band.  The  undecimated  residue  data  are  then  obtained.  The  kernel  is  then 
stretched  (K=l)  and  the  data  filtered  again  to  remove  the  next  octave  band. 

The  procedure  is  repeated  for  K=2,3, . . .  until  all  desired  bands  have  been 
removed.  Figure  2c  shows  the  response  of  a  stretched  minimum  coefficient 
filter  for  the  case  K«l. 

Filtering  by  FFT  methods  has  a  significent  advantage  over  full-kernel 
convolution  for  all  but  very  small  kernel  sizes.  It  will  now  be  shown  that 
the  class  of  filters  described  here  are  very  competitive  with  FFT  convolution 
for  many  practical  applications.  The  one  factor  which  is  the  largest  contributor 
in  determining  filtering  time,  no  matter  which  approach  is  taken,  is  the 
number  of  filter  coefficients.  In  time  domain  convolution  the  run  time 
increases  linearly  with  the  kernel  size.  With  FFT  convolution  the  size  of 
the  kernel  determines  the  number  of  points  in  the  transform.  Table  I  shows 
the  optimum  relationship  between  kernel  size  and  transform  size,  as  provided 
by  Helms. *  If  the  number  of  kernel  points  becomes  sufficiently  large  one  may 

*Helms,  Howard  D.,  "Fast  Fourier  Transform  Method  of  Computing  Difference 
Equations  and  Simulating  Filters",  IEEE  Transactions  on  Audio  and 
Electroacoustics ,  Vol.  AU-15,  No.  2 ,  June  19 ot . 
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become  memory  bound  and  be  forced  to  use  a  sub- optimum  size  transform..  To 

avoid  discontinuities  in  doing  repetative  FFT  convolutions,  each  transform 

must  overlap  the  previous  transform  by  the  number  of  points  in  the  kernel. 

The  entire  filtering  process  is  then  completed  by  repeating  the  FFT  approximately 

2M/N-L+1  times;  where  M  equals  the  length  of  the  data  list,  N  equals  the 

number  of  points  in  the  transform  and  L  equals  the  number  of  kernel  points. 

Table  II  gives  comparative  convolution  times  for  filtering  16384  data 

points  using  four  different  techniques.  Programs  were  run  in  FORTRAN  H  on 

an  IBM  System  360/75.  The  FFT  was  Implemented  with  Sande's*  algorithm  of 

N 

Radix  2,  and  as stoning  all  real  data,  performed  a  2  point  transform  on 
2N+1  data  points. 

The  number  of  kernel  points  used  in  column  1  of  Table  II  was  chosen 
to  be  close  to  the  upper  end  of  each  range  of  those  shown  in  column  1  of 
Table  I,  therefore  simulating  worst-case  conditions  for  both  methods  of 
convolution.  The  results  in  Table  II  are  also  plotted  in  Figure  3.  From 
this  plot  it  can  be  seen  that  with  respect  to  fast  FFT  convolution  the  classical 
full-kernel  convolution  loses  its  speed  advantage  at  about  30  kernel  points. 
Half-keruel  convolution,  being  about  1.5  times  faster  than  using  a  full- 
kernel,  extends  its  speed  a/ yantage  up  to  about  60  kernel  points.  And 
finally,  the  minimum-coefficients  filter  has  speed  advantage  over  the  FFT 
convolution  up  to  about  300  kernel  points. 


♦Gentleman,  W.  M.  and  Sande,  G. ,  "Fast  Fourier  Transforms  -  For  Fun  and  Profit", 
Proceedings,  Fall  Joint  Computer  Conference,  1966,  pp.  563-578. 


TABLE  II  Filtering  times  usinc  4  different  types  of  convolution  on  16384  real  data  points. 
Confutations  were  performed  on  an  IBM  360/75  in  FORTRAN  H. 
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The  octave  filtering  band  technique  has  been  implemented  on  the  1800 
speech  system  (2  psec  memory,  16  bit  word)  using  assembly  language,  fixed 
point  arithmetic,  and  kernels  with  from  seven  to  eleven  non- zero  coefficients. 

Both  the  stretched-kernel  (for  K  =  0,1, 2,3,4)  and  the  data-decimation 
techniques  are  available  as  options  and  the  kernel  currently  being  used  has 
seven  non-zero  coefficients  (N  =  26).  Because  of  the  small  number  of  arithmetic 
operations  required,  the  accumulated  truncation  errors  appear  to  be  insignificant. 
No  in  depth  error  analysis  has  been  attempted;  however,  when  using  the  seven 
coefficient  kernel,  a  DC  signal  is  attenuated  80  db  if  double  precision 
additions  are  performed  and  70  db  when  single  precision  additions  are  used. 

A  ten  percent  speed  advantage  is  achieved  using  single  precision  arithmetic. 

In  our  application  human  speech,  bandlimited  at  3200  Hz,  sampled  at 
12.8  KHz,  and  stored  in  8192  words  of  core  memory  is  filtered  into  five 
octave  bands  in  the  following  typical  times: 


Frequency  Bands 
in  Hz. 

Filter  Time 
(sec. ) 

1600  -  3200 

2.06 

800  -  1600 

.99 

400  -  1600 

.49 

200  -  400 

.23 

100-200 

.12 

Total 

Time  3.89 

The  8K  data  list  represents  .64  seconds  of  speech  data  or  about  one-sixth 
the  filtering  time. 
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For  many  applications  where  octave  bands  are  suitable,  the  stretched 
mi  rrtimim  coefficient  family  of  digital  filters  can  provide  filtering 
algorithms  which  are  both  easy  to  implement  and  faster  than  equivalent  FFT 
convolution.  Moreover,  because  of  their  inherently  short  lag  times  and 
the  straight  forward  arithmetic  involved  these  methods  are  amenable  to 
real  time  filtering  using  mini-machines  or  special  purpose  hardware. 

(b )  Wave  Function  Analysis/ ijynthesis 
As  indicated  in  the  Semiannual  Technical  Report  two  one-pass  wave 
function  analysis/synthesis  systems  based  on  Cosine  Modulation  Methods 
have  been  developed.  These  systems  are  the  Gaussian  Cosine  Modulated  (GCM) 
Analyzer/ Synthesizer  and  the  Hanning  Cosine  Modulated  (HCM)  Analyzer/ 
Synthesizer  respectively. 

A  member  of  the  GCM  family  of  wave- functions  can  be  described  by  the 
relation  _  i2 

-Cr(t-c  )T 

(5) 


U(t)GCM=Ae 


cos  (2TTFo(t-c)-0) 


which  is  a  Gaussian  envelope  of  amplitude  A,  centered  at  time  C,  and  with 
spread  S,  multiplied  by  a  cosine  wave  of  frequency  Fq  and  phase  0  with 
respect  to  C.  An  HCM  wave  function 


U(t)HcM=  2  [1+C0ST-  (t-c)]cos  2nFo(t-c)-0)  (6) 

is  a  Hanning  window  with  parameters  A,S,  and  C  multiplied  by  a  cosine  wave. 

Both  the  GCM  and  HCM  analysis  algorithms  have  been  implemented  on  the 
1800  speech  system.  Eoth  analyzers  employ  table  look-up  procedures  and  the 
coding  has  been  optimized  so  that  a  direct  one  pass  parameter  calculation 
based  on  extrema  data  provides  quite  high  speed  analysis.  The  analysis 
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process  consists  of  generating  five  lists  of  ASC0F  parameters  which  completely 
specify  the  time  behavior  of  five  bands  (e.g.  the  octave  bands  100-200, 

200-400,  400-800,  800-1600,  and  1600-3200  Hz  respectively)  of  filtered 
speech.  From  a  comparative  viewpoint  both  the  GCM  and  HCM  analyzers  require 
about  the  same  amount  of  time  to  analyze  one  second  of  speech  (e.g.  3 
seconds  per  second  of  speech  on  the  1800  system).  The  major  difference 
between  the  two  analysis  systems  is  that  the  HCM  analyzer  generates  on  the 
average  about  five  percent  more  ASC0F  parameters  than  the  GCM  analyzer. 

Both  the  GCM  and  HCM  synthesis  algorithms  have  been  implemented  on  the  . 
1800  speech  system.  The  purpose  of  the  synthesis  systems  has  been  to 
reconstruct  speech  waveforms  from  the  respective  GCM  and  HCM  ASC0F  parameter 
lists  to  make  an  auditory  and/or  visual  (on  a  CRT)  comparison  with  the 

/ 

original  recording  of  the  acoustic  waveform  to  varify  that  the  ASC0F  / 

lists  accurately  define  the  time  domain  behavior  of  the  original  acoustic 
waveform.  Experimental  studies  over  the  last  year  involving  a  large  number 
of  speech  samples  have  verified  that  both  the  HCM  and  GCM  models  will 
accurately  represent  the  time  domain  behavior  of  human  speech  without  any 
unacceptable  errors  in  the  analysis  processes.  Both  visual  observation  of 
the  time  demain  structure  of  synthetic  speech  versus  the  original  recording 
coupled  with  aural  comparisons  of  synthetic  and  original  versions  have  shown 
excellent  agreement  in  all  cases  between  the  two.  Similar  comparisons  between 
HCM  and  GCM  synthetic  speech  have  also  indicated  excellent  agreement.  This 
is  not  to  say  that  a  perfect  match  exists  between  human  speech  and  GCM  or 
HCM  wave  function  based  synthetic  speech.  The  data  do  not  always  precisely 
fit  the  models  chosen,  so  that  only  a  good  approximation  is  being  made  at 
times.  But,  in  almost  all  samples  studied,  these  differences  were  apparent 
only  to  the  eye  and  not  the  'iar. 

i 
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In  summary,  two  wave  function  analysis/synthesis  systems  have  been 

developed,  implemented,  and  tested  on  the  1800  speech  system  which  essentially 

completes  this  aspect  of  the  speech  project.  The  GCM  analysis/synthesis 

(*) 

algorithms  have  been  described  in  an  earlier  ARPA  report.  Both  the  HCM 
and  GCM  analysis/synthesis  procedures  are  also  completely  described  by 
Carey. ^ 

(c)  Computer  Classification  and  Recognition  of  Phonetic 


Information 

The  research  program  in  the  classification  and  recognition  of  phonetic 
information  has  solidly  established  that  recognition  parameters  can  be 
extracted  from  the  ASC0F  parameter  set  and  used  to  perform  reliable  recogni¬ 
tion  of  speech.  This  has  been  demonstrated  by  the  following  accomplishments : 

1.  Development  of  a  segmentation  procedure  which  reliably  partitions 
the  input  speech  data  into  basic  voiced- unvoiced  segments. 

2.  Utilization  of  the  segmentation  algorithm  in  the  formulation  of 
a  single  speaker  vowel  recognizer  for  recognizing  steady  state 
vowels  and  vowels  embedded  between  two  unvoiced  phonemes  using 
ASC0F  data. 

3.  Extension  of  the  segmentation  algorithm  for  voiced-unvoiced  phonemes 
to  the  detection  of  unvoiced  phonemes. 

4.  Preliminary  development  of  an  unvoiced  phoneme  recognizer. 

5.  Extension  of  the  segmentation  algorithms  of  (l.)  and  (3.)  to  the 
segmentation  of  voiced  phonemes. 

Voiced- Unvoiced  Phoneme  Segmentation  and  Vowel  Recognition 

The  voiced-unvoiced  segmentation  algorithm  and  vowel-recognizer  have  been 

(30) 

described  in  detail  in  the  Semiannual  Technical  Report.  '  The  single-speaker 
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vowel  recognizer  is  a  table  drive i  recognizer  which  uses  information  supplied 
by  the  segmenter  for  the  voiced  segment  to  identify  the  10  vowels  and  two 
diphthongs.  Because  of  the  preliminary  nature  of  the  segmentation,  the 
vowels  must  be  spoken  in  the  context  of  two  unvoiced  consonants  (e.g.  vowel 
embedded  between  the  two  consonants).  The  information  for  the  development 
and  testing  of  the  recognizer  was  obtained  from  a  data  base  of  240  recordings. 
Each  vowel  was  recorded  20  times,  at  various  pitches,  between  various  consonants, 
by  a  single  speaker.  Seven  basic  features  (frequency  in  the  upper  three 
bands  and  energy  in  all  4  bands)  were  extracted  from  each  recording  and 
filed.  Using  these  parameters,  a  series  of  two-dimensional  maps  were  constructed 
to  provide  the  appropriate  vowel  clustering  and  separation.  This  information 
was  then  employed  to  structure  the  vowel  recognizer. 

The  confusion  matrix  resulting  from  the  testing  of  the  recognizer  for 
the  3et  of  240  recordings  is  depicted  in  Figure  5.  For  the  240  recordings 
there  are  only  six  cases  where  a  vowel  was  missrecognized  (e.g.  3  cases  where 
/e/  was  classified  as  /l/,  two  cases  where  /ae/  was  classified  as  /d/,  and  a 
/single  cpse  where  /£/  was  classified  as  /ss/).  This  gives  a  recognition 
score  of  97.4  percent  correct  identification  for  the  240  recordings. 

The  two  /ae,A/  errors  occurred  because  the  best  separations  between 
/ae,A/  that  could  be  obtained  had  two  cases  of  overlap  of  /ae/  into  the  /a/ 
cluster.  Similarly,  the  single  error  in  which  /£/  was  classified  as  /ae/  was 
because  the  /ae/  cluster  contained  one  /e/.  The  problem  of  separation 
between  /l,e/  was  not  resolved  in  feature  space  nor  was  a  consistent 

i 

differentiating  characteristic  found  in  the  complex  of  wave function  parameters. 
Since  these  two  vowels  overlapped  in  all  frequency  and  energy  maps,  advantage 


was  taken  of  the  fact  that  /e/  is  a  diphthong  which  the  segmenter  breaks 
into  two  voiced  segments.  Thus  the  appearance  of  two  voiced  segments  was 
used  to  separate  /e/  from  / l/ .  Thit  rather  crude  decision  element  had  the 
highest  error  rate  by  failing  to  break  three  out  of  twenty  of  /e/  into  two 
segments . 

The  vowel  recognition  score  for  a  single  speaker  was  very  encouraging. 

It  reflected  the  belief  that  meaningful  features  could  be  extracted  from  the 
ASC0F  set.  The  pitch  synchronous  frequency  and  energy  estimators  were 
condensed  into  representative  frequency  and  energy  elements  with  a  sufficient 
accuracy  that,  in  most  cases,  the  examination  of  one  or  two  of  these  elements 
enabled  the  discrimination  between  vowels  or  groups  of  vowels.  Recognition 
of  multiple  speakers,  if  possible  with  this  technique,  may  require  an 
entirely  new  set  of  crossplots  for  each  speaker  or  a  simple  adjustment  of 
the  existing  decision  elements.  In  either  case,  a  scheme  for  automatic 
table  generation  for  a  known  number  of  pattern  classes  would  be  desirable. 
Then,  once  a  speaker  had  a  table  for  his  voice,  the  mere  placement  of  his 
table  in  the  recognizer  would  adapt  the  recognition  system  to  his  voice. 

These  factors  are  being  explored  in  order  to  adapt  the  single  speaker 
recognizer  to  multiple  speakers. 

The  17-node  vowel  recognition  tree  is  fairly  efficient,  having  only  six 
more  nodes  than  the  absolute  minimum.  In  the  worst  case,  eight  nodes  must  be 
traversed  to  reach  a  terminal  branch.  A  time  measurement  was  made  starting 
from  the  time  the  parameters  were  made  available  to  the  time  when  the  entire 
word  was  processed.  The  spoken  word  was  "shot"  with  a  vowel  duration  of 
approximately  200  msec.  This  vowel  required  processing  of  eight  nodes,  and 
total  run  time  was  114  msec. 
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Unvoiced  Rionane  Segmentation  and.  Recognition 

The  unvoiced  phonemes  of  general  American  English  are:  /s/  see,  /f/  fee, 
/•/  thin,  she,  A$/  chin,  /h/  he,  /p/  pee,  /t/  tee,  /is./  key.  All  share 
one  common  property,  aperiodicity,  and  aU  except  /h/  have  energy  peaks  above 
1200  Hz.  Ideally  the  unvoiced  phonemes  can  be  separated  into  two  mutually 
exclusive  classes,  stqplike  and  fricativelike  respectively.  The  /p/,  /t/, 
and  /k/  can  exist  in  three  different  forms;  l)  as  a  plosive  in  the  initial 
position  (pan,  tan,  can),  2)  as  a  stop  in  the  final  position  (complete  closure 
and  sound  termination,  e.g.  map),  3)  as  a  stop- silence-burst  in  the  medial 
or  final  position  (complete  closure  followed  by  pressure  release).  The  /t^/ 
can  exist  as  a  (1)  or  (3),  for  if  it  occured  as  a  stop  without  release,  it 
would  sound  like  a  /t/,  e.g.  "match"  would  become  "mat".  The  presence  of 
silence  preceeding  the  burst  cm  generally  be  expected,  except  when  the  tongue, 
at  the  termination  of  the  previous  phoneme,  is  already  in  the  position  of 
the  stqplike  phoneme  (e.g.  the  /n,t/  of  "enter"). 

Ihe  fricativelike  phonemes  are  /s/,  /f/,  /©/,  /^/,  and  /h/.  These  are 
characterized  as  low-energy  noise-like  sounds,  usually  of  longer  duration 
than  the  stqplike  phonemes. 

Segmentation  of  Uhvoiced  Bionemes 

Segmentation  of  the  unvoiced  phonemes  involved  the  extension  of  the 
voiced- unvoiced  segmentation  algorithm  to  the  unvoiced  case.  This  was 
accomplished  by  modifying  the  segmenter  to  detect  silence  in  the  unvoiced 
segment. 

The  detection  of  silence  serves  &3  a  means  of  separation  of  unvoiced 
phonemes  in  a  significant  number  of  words.  Examples  of  this  are  the  /s/  and 
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/p/  of  "spin"  or  /f/  and  /t/  of  "lift".  The  /s/  followed  by  a  /p/,  /t/, 
or  /k/  is  probably  the  most  commonly  occuring  sequence  of  unvoiced  phonemes. 
Other  words,  such  as  "sticks"  or  "cliffs"  contain  coupled  unvoiced  phonemes, 
the  majority  of  these  words  being  plural  nouns.  The  existence  of  a  /p,s/  or 
/t,s/  pair  can  be  detected  by  the  preceeding  silence  but  further  separation  of 
coupled  unvoiced  pairs  has  not  yet  been  attempted  in  this  study. 

For  ary  speech  input  which  has  been  filtered  into  fixed  bands,  silence 
is  defined  as  the  lack  of  wave  function  activity  for  at  least  30  msec  in 
all  bands,  over  the  same  time  interval.  The  test  for  silence  begins  in 
Band  5,  at  the  starting  point  of  the  unvoiced  segment.  The  successive 
values  of  the  ASCOF  parameter 


icj  ■  Vi-C3 


j  s  1,2,...,  n-1 

(n  ■  number  of  wavefunctions  in  the  unvoiced 
segment) 


are  compared  with  30  msec.  If  is  greater  than  30  msec,  the  time  is 
passed  to  the  next  lower  band  and  the  test  continued  from  that  point  in 
that  band.  This  time-transfer  is  perf owned  on  all  five  bands  and  then 
repeated  again,  with  the  second  pass  beginning  at  the  largest  from  the 
first  pass.  After  the  second  peas,  the  resulting  Cj  is  the  time  of  the  last 
wavefunction,  over  all  five  bands,  which  occurred  prior  to  the  quiet  zone. 

Two  passes  are  necessary  to  avoid  the  false  detection  of  silence  in  low- 
energy  signals  such  as  in  /f/  and  /©/.  A  simplified  example  of  this  is  shown 
in  Figure  6.  Each  "x"  signifies  a  wavefunction  and  the  figure  illustrates  a 


possible  wavefunction  sequence  for  the  /f,t/  of  the  word  "raft".  The  five 
bands  are  shown  twice  to  simulate  two  passes. 
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Figure  6 


Two  pass  time  transfer  which  might  occur 
in  the  detection  of  silence  in  the  /f,t/ 
of  the  word  *'raft". 


The  silence  associated  with  a  /p/,  /t/,  /k/,  /t^/  in  the  medial  or  final 
position  does  not  extend  over  the  entire  interval  between  the  end  of  the 
voiced  segment  and  the  beginning  of  the  burst.  At  the  termination  of  the 
voiced  segment  the  signal  is  still  damping  out  for  a  short  time  before  total 
silence  exists.  An  example  of  this  is  shown  in  Figure  7  for  the  word  "thick". 

The  segment  Immediately  following  the  vowel  is  actually  classified  as 
"unvoiced"  and  the  segment  following  that  is  the  quiet  zone.  If  the  phoneme 
i\fter  the  vowel  is  a  /p/,  /t/,  /k/,  or  /t^/,  this  unvoiced  segment  can  be 
considered  as  "null",  for  it  contains  little  or  no  phonemic  recognition 
information.  However,  in  the  general  case,  the  unvoiced  phonemes  /s/,  /^/, 
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/ f / ,  and  /©/  could  also  occur  in  this  same  position  (/h/  is  restricted  to 
•  the  initial  position).  Therefore,  the  null  segment  is  treated  as  a  phoneme, 

and  a  recognition  step  is  performed  to  determine  if  the  first  unvoiced  interval 
after  the  voiced  segment  is  null  or  one  of  the  other  four  possible  phonemes. 
Besides  the  limitation  on  the  position  of  /h/,  the  only  other  constraint 
placed  on  the  unvoiced  phonemes  is  that  the  /p/,  / 1/,  and  /k/  in  the  final 
position  must  be  released,  for  the  recognition  of  these  three  will  be  based 
upon  the  data  in  the  burst  segment. 
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Figure  7 


i'iarking  of  unvoiced  phonemes  (U)  and  voiced 
segments  (v)  for  the  word  "thick". 
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Figures  8,9,  and  10  depict  typical  results  of  the  unvoiced  phoneme 
segmentation  process. 

Figure  8  is  a  very  simple  example  of  the  detection  of  silence  at  the  end 
of  the  word  "chief".  In  this  case,  each  marked  segment  corresponds  to  one 
phoneme.  Figure  8b  illustrates  the  degree  of  energy  in  each  band  for  the 
A$/  and  /f/.  Especially  note  the  very  small  signal  in  the  Tipper  two  bands 
for  the  /f/.  This  Game  low  energy  signal  can  be  seen  back  in  Figure  7  for 
the  /©/of  "thick".  This  is  due  to  the  fact  that  the  energy  peaks  for  these 
two  phonemes  are  located  well  above  the  4375  Hz  cutoff  of  Band  5. 

Figure  9  shows  how  the  phonemes  /s/  and  /t/  of  "sticks"  are  separated 
by  silence.  The  voiced  segment  contains  only  vowel  /i/,  and  the  final  segment 
contains  the  two  phonemes  /k,s/.  The  acoustic  data  looks  like  an  /s/  hut  the 
preceeding  silence  indicates  the  presence  of  a  /p/,  /t/,  or  /k/  with  the  /s/. 

Figure  10  illustrates  the  location  of  the  unvoiced  phonemes  in  the  word 
"splashed".  The  /s/  characteristically  shows  its  strongest  energy  concentra¬ 
tion  in  Band  5,  while  the  /^/  has  its  energy  components  in  bands  4  and  5. 

The  /p/  is  short  duration  with  its  largest  component  in  Band  1,  while  the  /t/ 
concentrates  mostly  in  Bands  4  and  5*  In  this  case,  the  /^/  was  recognized 
as  an  unvoiced  phoneme  and  not  a  null  segment. 
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Figure  10  l'terkcrs  indicating  the  location  of  the 
unvoiced  phonemes  (U)  and  the  voiced 
segment  (V)  in  the  word  "splashed". 

Unvoiced  Fhoneme  Recognition 

Using  the  information  supplied  by  the  segmenter  for  the  unvoiced  segment 
a  prototype  single  speaker  unvoiced  phoneme  recognizer  was  implemented  to 
identify  the  9  unvoiced  phonemes.  Ihe  information  for  the  decision  tree 
in  the  recognizer  was  obtained  from  a  data  base  of  300  recordings.  Eacn 
unvoiced  phoneme  was  recorded  at  least  20  times  at  various  pitches  by  a 
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single  speaker.  A  feature  list  like  the  one  depicted  in  Figure  11  is 
extracted  for  each  unvoiced  segment  of  a  word  and  filed.  Using  these  features 
a  series  of  two-dimensional  maps  were  constructed  to  provide  the  appropriate 
unvoiced  phoneme  clustering  and  separation. 


Rioneme  Class  (vowel,  fricative,  stop,  nasal,  etc.) 


General  Segment  ID  (Voiced,  Ubvoiced) 


ement  into  feature  list 


egment  starting  time 


Segment  stop  time 


Band  1  Ene 


Band  2  Energy 


Band  3  Energy 


Band  4  Energy 


Band  5  Energy 


Average  Band  1  Weighted  Frequency 


Average  Band  2  Weighted  Frequency 


Average  Band  3  Weighted  Frequency 


Average  Band  4  Weighted  Frequency 


Average  Band  5  Weighted  Frequency 


Figure  U  Format  of  feature  list  generated 
for  each  segment. 


Typical  examples  of  the  two-dimensional  clustering  are  shown  in  Figures 
12,  13,  and  l4.  Figure  12  shows  the  decision  element  for  separating  the 
unvoiced  phoneme  /k/  (K)  from  the  unvoiced  phonemes  /^/  and  A$/  »  and  C 
respectively).  A  second  example  is  illustrated  in  Figure  13  where  the 
unvoiced  phonemes  /t/  (T)  and  /s/  (S)  exhibit  separation.  Figure  l4 


illustrates  a  third  example  of  two-dimensional  clustering  where  the  unvoiced 
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phoneme  /p/  (P)  has  been  separated  from  the  phoneme  /t/  (T). 

The  decision  tree  being  employed  for  the  prototype  unvoiced  phoneme 
recognizer  is  shown  in  Figure  15.  All  decision  elements  are  either  one  or 
two  demensional,  based  on  cluster  formations  similar  to  those  of  Figures 
12  -  14.  Note  that  the  tree  contains  26  nodes  and  Figures  12-14  correspond 
to  nodes  7,  15,  arl  20  respectively. 


Figure  12  Separation  of  unvoiced  phoneme 
photo  /k/  (K)  from  /y  (X)  and  /ty 

(C)  using  Band  5  energy  and 
Band  1  energy. 


voiced  segment  and 
Band  4  energy. 


Figure  l4  Separation  cf  /p/  (P)  from 
photo  /t/  (T)  using  Band  4 

energy  and  Band  5  energy. 


/t//K//f//0//h//t\//\/  |  /t//K//f//«//h//p//s/ 


Figure  15  Decision  Tree  for  Single  Speaker  Unvoiced  Phoneme  Recognizer 
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Some  initial  recognition  results  ootained  from  the  recognizer  are  shewn  in 
Figures  16  -  18.  Figure  16  illustrates  the  correct  recognition  of  the  two 
unvoiced  phonemes  /©/(TH)  and  /k/  (KK)  in  the  word  "thick".  In  addition  the 
vowel  /i/  (II)  has  also  been  correctly  recognized  since  the  vowel  recognizer 
has  been  included  •-  part  of  the  unvoiced  phoneme  recognizer.  Figure  17  shows 
correct  identification  of  the  phonemes  /k/  (KK),  /i/  (II)  and  /k/  (KK)  in  the 
word  "kick".  Figure  18  shows  correct  recognition  of  the  unvoiced  phonemes 
/s,P,JJ,t/  in  the  word  "splashed".  Note  that  the  voiced  segment  /l,a e/  has 
been  missrecognized  as  the  vowel  /u/  (UH).  His  i„  due  to  the  fact  that  the 
segmentation  program  has  no  provisions  for  segmenting  connected  voiced  phonemes. 


TH  II  M(  KK  H  •« 

Figure  16  Correct  recognition  of  the  phoneme  Figure  17  Correct  recognition  of  the 

photo  string  /©,I,k/  (TH,II,KK)  in  the  photo  phonemes  /k,I,k/  (KK,II,KK) 

word  "thick".  /  in  the  word  "kick". 


Figure  18  Recognition  of/>the  unvoiced 

photo  phonemes  /s,p,\,t/  (SS,PP, 

SH,TT)  in  the  word  "splashed" 
The  voiced  segment  /l,ae  / 
has  been  missrecognized  as 
the  vowel  /u/  (UH)  since  no 
provisions  for  segmenting 
and  recognizing  connected 
voiced  phonemes  exist  in  the 
recognizer. 
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Segmentation  of  Voiced  Phonemes 

Voiced  phoneme  segmentation  basically  involved  the  extension  of  the 
unvoiced  phoneme  segmentation  algorithm  to  isolate  the  clusters  of  voiced 
phonemes  existing  in  the  voiced  segment  generated  by  the  segmenter.  Clusters 
of  voiced  phonemes  can  exist  as  much  longer  strings  than  the  unvoiced  phonemes. 
For  this  reason,  the  phonemic  coupling  can  become  vey  complex  and  the 
boundaries  very  subtle.  Therefore,  the  following  preliminary  constraints 
were  established  with  regards  to  segmentation: 

1)  The  voiced  portion  of  any  input  word  must  no+  contain  vowel-pairs 
or  consonant-pairs. 

2)  The  allowable  voiced  phoneme  set  consists  of  the  twelve  vowels, 
and  the  consonants  /m/  me,  /n/  no,  /n/  song,  /y/  yes,  /l/  late, 

/r/  row,  /w/  wet,  /b/  bet,  /d/  done,  /g/  get,  /v/  vote,  /z/  zoo, 

/th/  them. 

Phonemic  activity  occurring  within  a  word  can  be  predicted  by  performing 
an  alignment  of  the  regions  of  change  within  each  filter  band.  Regions  of 
change  are  locations  where  the  amplitude  of  the  signal,  actually  the  outer 
envelope  of  the  signal,  goes  from  large  to  small,  or  vice  versa.  The  outer 
envelope  of  each  filter  band  during  a  voiced  segment  Is  generated  during  the 
pitch- synchronous  data  reduction  step.  The  envelopes  for  the  first  four  bands 
of  "yellow"  are  shown  in  Figure  19 .  Some  envelopes  contain  peaks  and  valleys 
and  it  is  not  too  difficult  to  determine  transition  regions  in  Bands  2,  3,  and 
4.  However,  the  alignment  of  these  regions  is  not  a  simple  process  to  automate. 
The  transition  into  the  first  peak  in  Band  1  occurs  somewhat  earlier  than 
those  of  Bands  2  and  3*  The  transition  out  of  the  first  peak  of  Band  4  is 
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occurring  at  the  peaks  of  Bt'-nds  2  and  3*  Examples  such  as  these  can  be  found 
in  a  great  number  of  words,  making  the  job  of  alignment  non- trivial.. 


Band  1 
Band  2 


Band  3 
Band  4 


Figure  19  Envelopes  of  the  first  four  bands  of 
the  word  ’yellow". 

A  most  interesting  feature  of  the  envelopes  is  that,  in  most  cases,  the 
major  peaks  correspond  to  vowels  and  the  valleys  occur  at  the  location  of 
non- vowels.  Ibis  is  to  be  expected  since  the  vowels,  with  their  multiple 
resonant  peaks,  have  significant  energy  in  at  least  three  of  the  first  four 
frequency  bands,  whereas  the  voiced  consonants  are  lower  energy  phonemes. 

Since  most  of  the  fonnant  movement  is  taking  place  in  the  region  covered  by 
Bands  2,  3,  and  4  the  peak- valley  phenomena  is  most  obvious  in  these  bands. 

The  Band  1  envelope  does  not  exhibit  much  change  unless  a  vowel  is  next  to  a 
low  energy  consonant  such  as  /b/,  /d/,  or  /g/. 

To  eliminate  the  alignment  problem  associated  with  multiple  bands,  the 
envelopes  from  Bands  2,  3,  and  4  were  added  together  to  form  a  single  composite 
envelope  which  could  be  used  to  determine  the  transitions  between  peaks  and 
valleys.  Ibis  addition  also  has  the  advantage  of  accentuating  the  difference 
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between  the  peaks  and  valleys.  Figure  20  shows  sane  examples  of  the  3-band 
composite  envelopes  for  the  words  "yellow",  "leaving",  and  "alone".  Recall¬ 
ing  constraint  number  (l),  it  should  be  noted  that  vowel  clusters  or  voiced 
consonant  clusters  occur  as  one  large  peak  or  one  wide  valley. 

To  reduce  the  detection  of  false  peaks  and  valleys  due  to  the  sometimes 
uneven  contour  of  this  new  envelope,  it  is  subjected  to  a  smoothing  process. 
This  is  done  by  a  program  which  performs  a  recursive  5-point  smoothing  of  the 
composite  envelope.  The  smoothed  composite  envelopes  for  "yellow",  "leaving", 
and  "alone"  are  shown  in  Figure  21. 


(c) 

Figure  20  Composite  envelope  resulting  from  the  summation  of 
the  individual  envelopes  from  Bands  2,  3,  and  4  for 
the  words  (a)  "yellow",  (b)  "leaving",  (c)  "alone". 
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Figure  21  Smoothed  (Line)  and  Squared 
(Dots)  3-band  composite 
envelopes  for  the  words: 

(a)  "yellow",  (b)  "leaving", 
(c)  "alone". 


Based  upon  this  single  envelope,  phoneme  boundaries  are  determined  to  be 

somewhere  along  the  transitions  between  peaks  and  valleys.  A  four-point  check 

. 

is  vised  in  the  determination  of  a  peak  or  valley.  If  Xj>  and  Xj>  Xj+1  and 
xj+l>  xj+2*  then  a  peak  of  aa!P^‘tude  P  *  x j  exists.  If  Xj<x^_1  and  Xj<Xj+1  and 
Xj+1<Xj+2,  then  a  valley  of  anplitude  V  ■  x^  exists. 

In  order  for  an  envelope  transition  to  qualify  as  a  phoneme  boundary,  the 
distance  between  a  valley  and  the  proceeding  peak,  or  a  peak  and  the  proceeding 

V 

valley  must  be  sufficiently  large.  This  distance  is  a  function  of  the  data 

itself  and  depends  upon  whether  the  transition  is  peak-to- valley  or  valley- to- 

peak.  If  the  current  valley,  V,  is  less  than  .5P  (previous  peak),  then  a 

P+3V 

boundary  marker  is  placed  on  the  transition  at  the  point  where  x^=  — .  If 
the  current  peak,  P,  is  greater  than  1.25V  (previous  valley),  then  a  boundary 
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marker  is  placed  on  the  transition  at  the  point  where  The  values 

of  the  various  constants  were  determined  empirically. 

Voiced  segments  at  the  beginning  and  ending  of  words  introduce  problems 
when  segmentation  is  performed  using  the  composite  envelope  approach.  If 
the  voiced  segment  at  the  beginning  of  a  word  is  not  preceeded  by  a  definite 
valley  a  problem  arites  in  determining  whether  there  are  one  or  two  phonemes 
contained  within  the  peak.  In  addition  the  segmenter  must  determine  whether 
the  voiced  segment  begins  with  a  vowel  or  a  consonant.  To  solve  this  dilerma, 
the  following  steps  are  performed: 

(1)  square  the  3-band  composite  envelope. 

(2)  on  the  squared  data,  find  the  maximum  slope  between  the  beginning 
of  the  segment  and  the  first  peak. 

(3)  project  that  slope  down  to  the  time  axis. 

(4)  if  the  time  where  the  projection  crosses  the  axis  is  greater  than 
the  starting  time  of  the  segment,  then  the  segment  begins  with  a 
consona  «t . 

(5)  if  the  cime  where  the  projection  crosses  the  axis  is  less  than  the 
starting  time  of  the  segment,  then  the  segment  begins  with  a  vowel. 

When  it  is  determined  that  a  segment  begins  with  a  consonant,  the  half¬ 
way  point  between  the  beginning  of  the  segment  and  the  first  peak  of  the 
3-band  composite  (not  squared)  is  designated  as  the  boundary  between  the 
consonant  and  the  vowel. 

The  reasoning  behind  this  method  of  projection  is  the  consonants,  with 
their  lower  energy  in  Bands  2,  3,  and  4,  will  cause  the  composite  envelope 
(after  squaring)  to  start  out  at  a  low  amplitude  and  rise  rather  steeply  to 
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the  first  peak.  The  vowels,  on  the  other  hand,  will  start  out  at  a  higher 
amplitude  and  therefore  have  a  more  shallow  slope.  The  effects  of  squaring 
the  3-hand  composite  envelope  were  included  in  Figure  21. 

The  end  of  the  voiced  segment  poses  the  same  problem  as  the  beginning. 

Does  the  segment  end  with  a  vowel  or  a  consonant?  The  same  projection  technique 
is  used  at  the  end,  but  with  slight  modification.  The  steps  are: 

(1)  on  the  3-band  composite  envelope  (not  squared)  find  the  maximum 
slope  between  the  end  of  the  segment  and  the  half-way  point  to  the 
last  peak. 

(2)  project  the  slope  to  the  time  axis. 

(3)  if  the  time  of  the  axis  intercept  is  less  than  the  time  of  the  3rd- 
to-the-last  pitch  period,  the  segment  ends  with  a  consonant. 

(4)  if  the  time  of  the  axis  intercept  is  greater  than  the  time  of  the 
3rd-to-the-last  pitch  period,  the  segment  ends  with  a  vowel. 

When  it  has  been  determined  that  a  segment  ends  with  a  consonant,  the 
boundary  marker  is  placed  at  the  point  of  maximum  slope  from  step  1.  In  the 
ending  conditions,  the  unsquared  envelope  is  used  because,  at  the  end  of  a 
word,  the  vowels  tend  to  taper  off  slowly  in  energy  and  therefore  do  not  stop 
abruptly. 

The  following  four  figures  demonstrate  the  effectiveness  of  the 
segmentation  for  a  male  speaker.  The  voiced  phonemic  segements  are  identified 
with  a  "V"  for  vowellike  or  a  "VC"  for  voiced  consonant.  Unvoiced  phonemes 
are  designated  by  a  "U".  Tbe  vertical  lines  on  the  figures  are  not  to  be 
thought  of  as  exact  phoneme  boundaries,  but  rather  as  approximate  locations 
of  transition  regions. 
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Figure  22  illustrates  how  the  /d/,  /m/,  /n/  and  final  /n/  of  "dimension", 
by  a  male  speaker,  have  been  correctly  classified  as  voiced  consonants,  and 

A 

the  /i/,  /£/,  and  /A/  as  vowellike.  The  /^/  is  also  correctly  classified  as 
unvoiced.  Figure  23  shows  the  correct  voiced  phoneme  segmentation  for  the 
word  "leaving".  Figure  24  pictures  the  vowel/consonant  separation  for  the 
word  "yellow",  by  a  male  speaker.  Figure  25  shows  that,  for  the  string  of 
six  coupled  phonemes  in  the  word  "redeemer",  the  phoneme  boundary  markers  Me 
correctly  located  and  the  phonemes  properly  classified. 


II  |i  i  i 

V  V  VC  v  vc  u  *  >C 

figure  22  Separation  of  the  vowellike 
sounds  (V)  from  the  voiced 
consonant  sounds  (VC)  for  the 
word  "dimension"  by  a  male 
speaker. 


V  VC  V  VC  V  \c 

figure  24  Separation  of  the  vowellike 
sounds  (V)  from  the  voiced 
consonants  (,VC)  in  the  word 
"leaving",  by  a  male  speaker. 


vc  V  \c 


Figure  23  Separation  of  the  vowellike 
sounds  (v)  from  the  voiced 
consonant  sounds  (VC)  in  the 
word  "y®How",  by  a  male 
speaker. 


uve  v  vc  v  >* 


Figure  25  Separation  of  the  vowellike 
sounds  (V)  from  the  voiced 
consonant  sounds  (VC;  in  the 
word  "redeemer",  by  a  male 
speaker. 


fc.fi>  w&fc! 


71 


The  segmenter  described  in  this  section  represents  a  preliminary  version 
of  the  final  segmenter  that  will  be  used  in  a  single- speaker  phoneme  recognizer. 
The  segmenter  is  currently  being  tested  on  a  wide  spectrum  of  words  in  order 
to  measure  its  effectiveness  and  to  modify  it  wheie  necessary.  A  data  base 
of  157  words  has  been  created  up  to  this  point  in  time.  The  basic  features 
of  frequency  and  energy  in  each  frequency  band  have  been  extracted  and  entered 
into  appropriate  files  for  the  voiced  and  unvoiced  segment  of  each  of  these 
words.  These  features  will  be  utilized  to  structure  the  single  speaker 
phoneme  recognizer  when  the  segmenter  has  been  finalized.  Note  that  the  major 
effort  required  in  the  recognizer  will  be  the  recognition  of  the  voiced 
phonemes  since  algorithms  have  already  been  established  for  vowel  and 
unvoiced  phoneme  recognition. 
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(d)  Data  Compression  Studies 

The  end  objective  to  data  compression  of  speech  is  that  all  utterances 


% 
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or  sounds  still  be  intelligible.  This  criterion  provides  a  necessary  and 
sufficient  test  for  all  ideas,  and  is  quantitatively  measurable. 

The  basic  speech  waveform  is  bandllmlted  to  .1  -  4.5  KHz.,  and  the 
five  sub-bands  employed  are  .1  -  .4,  .4  -  .8,  .8  -  1.6,  1.6  -  3*2,  3.2  - 
4.5  KHz. 


Three  general  areas  of  study  have  been  emphasized  in  the  compression 
studies  completed  to  date.  The  first  is  the  tabulation  of  the  data  rate  of 
the  "highest  fidelity"  ASC0F  representation.  This  involves  making  the 
best  possible  fit  to  the  speech  waveform,  in  the  time  domain,  with  Gaussian 
wave- functions  and  determining  the  required  bit  rate  without  further 
compression.  The  bit  rate  is  estimated  by  computing  the  number  of  wave- 
functions  required  and  assuming  an  average  value  of  eight  bits  for  each  of 
the  five  parameters.  For  example:  the  word  "Unite"  [420  msec  long,  filtered 
into  five  sub-bands]  was  found  to  require  826  wave-functions.  The  bit  rate 
is  found  by: 

bit  rate  -  826  («eve-functioM)  *  r  (pfr££«)  x  1.20  We)' 


=  78,666  bits/sec 

Studies  have  shown  bit  rates  for  full  words  on  the  order  of  45  -  100,000 
bits/sec  for  the  raw  ASC0F  stream.  These  results  cure  somewhat  higher  than 
estimated  previously;  this  is  the  direct  result  of  including  an  additional 
band  (3.2  -  4.5  KHz)  which  was  found  to  be  necessary  for  fricatives.  The 
above  bit  rates  improve  somewhat  for  phrases,  as  the  ASC0F  representation 
automatically  ceases  for  any  quiet  period,  regardless  of  how  short. 
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The  second  general  area  of  study  has  been  an  attempt  to  determine  how 
many  of  these  wave-functions  are  superfluous  from  a  perceptual  standpoint. 

The  elimination  of  redundant  parameter  sets  is  extremely  simple  to  implement 
(given  a  criterion  for  redundancy),  and  seemed  the  logical  place  to  begin. 

Two  techniques  are  employed  to  reduce  the  ASC0F  stream  to  a  minimum  level 
while  still  retaining  the  ability  to  reconstruct  a  satisfactory  replica  of 
the  sound  based  only  on  the  wave  functions  remaining. 

A  sort  program  has  been  implemented  to  select  the  "principal”  (the 
amplitude  of  a  wave-function  being  a  local  maximum  or  within  80$  of  one 
of  its  neighbors)  wave- functions  in  the  five  frequency  bands.  In  the  lower 
two  bands,  most  of  the  wave- functions  qualify  as  principals;  however,  there 
is  considerable  redundancy  in  the  higher  three  bands.  The  second  technique 
discards  ASC0F  sets  with  an  amplitude  less  than  a  (normalized)  threshold. 

One  procedure  normalizes  with  respect  to  the  largest  amplitude  in  the  band 
and  sets  a  threshold  test.  This  allows  a  consideration  of  the  possible 
differences  in  amplitudes  from  band  to  band.  Another  procedure  is  to  normalize 
with  respect  to  the  largest  amplitude  in  a  moving  time  window  and  employ  a 
threshold  test.  For  a  group  of  test  words,  the  percentage  of  ASC^F  sets 
remaining  after  executing  the  preceding  routines  ranged  from  13.7  -  30$. 

The  third  general  area  of  study  has  been  the  encoding  of  the  remaining 
ASC{fc  sets.  Assumed  in  the  bit  rate  calculations  for  the  original  raw  data 
were  8  bits  per  parameter.  With  the  final  criterion  being  intelligibility, 
an  attempt  is  being  made  to  determine  the  minimum  number  of  bits  necessary 
for  each  parameter.  A  summary  of  results  obtained  to  date  are  as  follows: 
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For  amplitude,  3  bits  are  being  used  to  quantize  into  8  levels.  Non- 
uniform  quantization,  with  an  emphasis  on  the  lower  values  of  amplitude 
was  found  to  be  more  acceptable  than  equal  spacing  between  levels. 

For  spread,  0  bits  are  now  being  used.  It  was  found  acceptable  to  use 
one  value  for  spread  in  each  of  the  five  bands.  The  values  being  used 
are  8.1,  6.5*  2.5,  2.5,  1.8  msec.  These  values  remain  the  same  for 
all  words. 

For  centers,  Ac  and  not  C  is  quantized  using  8  bits.  A  coarse- fine 
technique  is  used  with  the  fine  quantization  (0  -  12.8  msec)  accurate 
to  the  nearest  .1  msec  and  the  course  quantization  (12.8  msec  -  128  msec) 
accurate  to  within  1  msec. 

For  phase,  1  bit  is  being  used.  It  was  found  that  quantizing  phase 
often  resulted  in  a  "scratchy"  quality.  This  was  determined  to  be  a 
function  of  the  interdependence  of  phase  and  center  location.  The 
remedy  was  to  quantize  phase,  and  then  by  holding  the  time  value  of  the 
maximum  extrema  constant,  calculate  a  new  corresponding  center  location. 
This  eliminated  any  "scratohiness"  by  matching  phase  and  center.  The 
center  location  was  found  to  adjust  up  to  1.5  msec  for  some  words. 

For  frequency,  it  was  found  necessary  to  use  1  bit  for  the  lower  three 
bands  and  2  bits  for  the  upper  two  bands. 

Results  of  test  words  used  s.ppear  in  Tables  1  and  2.  Recognition  at  the 
reduced  levels  are  quite  good,  however  no  formal  testing  has,  to  date,  been 


done. 
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Studies  are  presently  being  conducted  into  possible  repeating  of  ASC0F 
parameters  for  a  voiced  region.  Preliminary  results  are  encouraging  and  a 
further  reduction  of  500  -  1200  bits/sec  per  word  would  seem  possible. 
Procedures  for  the  final  testing  of  intelligibility  are  currently  under 
consideration. 

Conclusion 

Research  accomplished  has  been  in  consonance  with  the  tasks/objectives 
assigned.  The  research  will  form  the  foundation  to  continue  research  in 
On-Line  computing  and  human  speech  as  a  mode  of  communicating  with  a  computer. 
This  research  will  be  conducted  under  contract  DAH00U-71-C-0O43. 

Submitted  by: 
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